Awesome Open Source

Programming Languages

The Top 23 Crawler Open Source Projects

Open source projects categorized as Crawler

Categories > Data Processing > Crawler

Edit Category

Scrapy ⭐ 49,918

Scrapy, a fast high-level web crawling & scraping framework for Python.

dependent packages 445total releases 96most recent commit 3 months ago

pypi Scrapy} Downloads

👾 Fast and simple video download library and CLI tool written in Go

dependent packages 8total releases 40most recent commit 22 days ago

Colly ⭐ 21,902

Elegant Scraper and Crawler Framework for Golang

dependent packages 328total releases 22most recent commit a month ago

Easyspider ⭐ 20,149

A visual no-code/code-free web crawler/spider易采集：一个可视化浏览器自动化测试/数据采集/爬虫软件，可以无代码图形化

most recent commit 21 days ago

Proxy_pool ⭐ 19,442

Python ProxyPool for web spider

most recent commit 4 months ago

Pyspider ⭐ 15,943

A Powerful Spider(Web Crawler) System in Python.

dependent packages 2total releases 17most recent commit 10 months ago

pypi pyspider} Downloads

Newspaper ⭐ 13,147

News, full-text, and article metadata extraction in Python 3. Advanced docs:

dependent packages 97total releases 18most recent commit 7 months ago

pypi newspaper3k} Downloads

Examples Of Web Crawlers ⭐ 13,142

一些非常有趣的python爬虫例子,对新手比较友好,主要爬取淘宝、天猫、微信、微信读书、豆瓣、QQ等 interesting examples of python crawlers that are friendly to beginners. )

most recent commit 4 months ago

Tushare ⭐ 12,165

TuShare is a utility for crawling historical data of China stocks

most recent commit a year ago

Crawlee ⭐ 12,106

Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.

dependent packages 42total releases 747most recent commit 16 hours ago

npm crawlee} Downloads

Webmagic ⭐ 11,080

A scalable web crawler framework for Java.

dependent packages 22total releases 25most recent commit 3 months ago

Crawlab ⭐ 10,521

Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台，支持任何语言和框架

total releases 1most recent commit 3 months ago

Photon ⭐ 10,244

Incredibly fast crawler designed for OSINT.

total releases 18most recent commit 4 months ago

Python ⭐ 9,097

Python脚本。模拟登录知乎，爬虫，操作excel，微信公众号，远程开机

most recent commit 6 months ago

Avbook ⭐ 8,777

AV 电影管理系统， avmoo , javbus , javlibrary 爬虫，线上 AV 影片图书馆，AV 磁力链接数据库，Japanese Adult Video Library,Adult Video Magnet Links - Japanese Adult Video Database

most recent commit a year ago

Spider Flow ⭐ 8,075

新一代爬虫平台，以图形化方式定义爬虫流程，不写代码即可完成爬虫。

most recent commit 10 months ago

Katana ⭐ 7,995

A next-generation crawling and spidering framework.

dependent packages 1total releases 8most recent commit 3 months ago

Infospider ⭐ 6,856

INFO-SPIDER 是一个集众多数据源于一身的爬虫工具箱🧰，旨在安全快捷的帮助用户拿回自己的数据，工具代码开源，流程透

most recent commit 7 months ago

Node Crawler ⭐ 6,610

Web Crawler/Spider for NodeJS + server-side jQuery ;-)

dependent packages 146total releases 31most recent commit 4 months ago

npm crawler} Downloads

Awesome Web Scraping ⭐ 6,060

List of libraries, tools and APIs for web scraping and data processing.

most recent commit 5 months ago

Awesome Crawler ⭐ 5,859

A collection of awesome web crawler,spider in different languages

most recent commit 5 months ago

Wechatsogou ⭐ 5,777

基于搜狗微信搜索的微信公众号爬虫接口

total releases 25most recent commit 5 months ago

Ferret ⭐ 5,540

Declarative web scraping

dependent packages 5total releases 56most recent commit 4 months ago

The 10 Most Depended On Crawler Open Source Projects

Scrapy ⭐ 49,918

Scrapy, a fast high-level web crawling & scraping framework for Python.

dependent packages 445total releases 96latest release September 18, 2023most recent commit 3 months ago

pypi Scrapy} Downloads

Node Rate Limiter ⭐ 1,444

A generic rate limiter for node.js. Useful for API clients, web crawling, or other tasks that need to be throttled

dependent packages 329total releases 16latest release May 19, 2021most recent commit 8 months ago

npm limiter} Downloads

Colly ⭐ 21,902

Elegant Scraper and Crawler Framework for Golang

dependent packages 328total releases 22latest release March 08, 2022most recent commit a month ago

Linkinator ⭐ 955

🐿 Scurry around your site and find all those broken links.

dependent packages 313total releases 106latest release November 22, 2023most recent commit 3 months ago

npm linkinator} Downloads

Static Site Generator Webpack Plugin ⭐ 1,538

Minimal, unopinionated static site generator powered by webpack

dependent packages 214total releases 21latest release November 19, 2018most recent commit 5 years ago

npm static-site-generator-webpack-plugin} Downloads

Node Crawler ⭐ 6,610

Web Crawler/Spider for NodeJS + server-side jQuery ;-)

dependent packages 146total releases 31latest release December 30, 2022most recent commit 4 months ago

npm crawler} Downloads

A Devtools driver for web automation and scraping

dependent packages 140total releases 406latest release November 06, 2023most recent commit 3 months ago

Puppeteer Sharp ⭐ 3,135

Headless Chrome .NET API

dependent packages 120total releases 78latest release December 05, 2023most recent commit 15 days ago

nuget PuppeteerSharp} Downloads

Crawler Detect ⭐ 1,842

🕷 CrawlerDetect is a PHP class for detecting bots/crawlers/spiders via the user agent

dependent packages 110total releases 156latest release July 21, 2023most recent commit 5 months ago

packagist jaybizzle/crawler-detect} Downloads

Zhihu Api ⭐ 267

Unofficial API for zhihu.

dependent packages 103total releases 39latest release July 16, 2017most recent commit 7 years ago

npm zhihu-api} Downloads

The 10 Latest Releases In Crawler Open Source Projects

Taobao_bra_crawler ⭐ 189

a taobao web crawler just for fun.

latest release December 26, 2023most recent commit 5 years ago

Jmcomic Crawler Python ⭐ 247

Python API for JMComic | 提供Python API访问禁漫天堂，同时支持网页端和移动端 | 禁漫天堂GitHub Actions下载器🚀

total releases 102latest release December 19, 2023most recent commit 3 months ago

The fastest web crawler written in Rust. Maintained by @a11ywatch.

dependent packages 5total releases 368latest release December 11, 2023most recent commit 3 months ago

cargo spider} Downloads

Comiccrawler ⭐ 251

An image crawler written in Python.

total releases 175latest release December 10, 2023most recent commit 3 months ago

Crawlee ⭐ 12,106

Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.

dependent packages 42total releases 747latest release December 10, 2023most recent commit 16 hours ago

npm crawlee} Downloads

Pagemunch Php ⭐ 6

A PHP library for the PageMunch web crawler API

latest release December 10, 2023most recent commit 7 years ago

Crawley ⭐ 208

The unix-way web crawler

total releases 60latest release December 07, 2023most recent commit 5 months ago

Google News Scraper ⭐ 144

Lightweight scraper for Google News

dependent packages 2total releases 12latest release December 06, 2023most recent commit 4 months ago

npm google-news-scraper} Downloads

Puppeteer Sharp ⭐ 3,135

Headless Chrome .NET API

dependent packages 120total releases 78latest release December 05, 2023most recent commit 15 days ago

nuget PuppeteerSharp} Downloads

Xalpha ⭐ 1,851

基金投资管理回测引擎

dependent packages 1total releases 53latest release December 04, 2023most recent commit 5 months ago

pypi xalpha} Downloads

Popular Data Processing Categories

Jupyter Notebook

Categories

Application Programming Interfaces

Artificial Intelligence

Cloud Computing

Command Line Interface

Computer Science

Configuration Management

Content Management

Data Processing

Integrated Development Environments

Learning Resources

Lists Of Projects

Machine Learning

Operating Systems

Package Managers

Programming Languages

Runtime Environments

Software Architecture

Software Development

Software Performance

Software Quality

Text Processing

User Interface Components

Version Control

Web User Interface

Privacy | About | Terms | Follow Us On Twitter

Downloads, Dependent Repos, Dependent Packages, Total Releases, Latest Releases data powered by Libraries.io.

Copyright 2018-2024 Awesome Open Source. All rights reserved.