Awesome Open Source
Awesome Open Source
Combined Topics
crawler
x
Advertising
📦 10
All Projects
Application Programming Interfaces
📦 124
Applications
📦 192
Artificial Intelligence
📦 78
Blockchain
📦 73
Build Tools
📦 113
Cloud Computing
📦 80
Code Quality
📦 28
Collaboration
📦 32
Command Line Interface
📦 49
Community
📦 83
Companies
📦 60
Compilers
📦 63
Computer Science
📦 80
Configuration Management
📦 42
Content Management
📦 175
Control Flow
📦 213
Data Formats
📦 78
Data Processing
📦 276
Data Storage
📦 135
Economics
📦 64
Frameworks
📦 215
Games
📦 129
Graphics
📦 110
Hardware
📦 152
Integrated Development Environments
📦 49
Learning Resources
📦 166
Legal
📦 29
Libraries
📦 129
Lists Of Projects
📦 22
Machine Learning
📦 347
Mapping
📦 64
Marketing
📦 15
Mathematics
📦 55
Media
📦 239
Messaging
📦 98
Networking
📦 315
Operating Systems
📦 89
Operations
📦 121
Package Managers
📦 55
Programming Languages
📦 245
Runtime Environments
📦 100
Science
📦 42
Security
📦 396
Social Media
📦 27
Software Architecture
📦 72
Software Development
📦 72
Software Performance
📦 58
Software Quality
📦 133
Text Editors
📦 49
Text Processing
📦 136
User Interface
📦 330
User Interface Components
📦 514
Version Control
📦 30
Virtualization
📦 71
Web Browsers
📦 42
Web Servers
📦 26
Web User Interface
📦 210
The Top 354 Crawler Open Source Projects
Categories
>
Data Processing
>
Crawler
Scrapy
⭐
39,919
Scrapy, a fast high-level web crawling & scraping framework for Python.
Pyspider
⭐
14,899
A Powerful Spider(Web Crawler) System in Python.
Annie
⭐
14,227
👾 Fast, simple and clean video downloader
Colly
⭐
13,217
Elegant Scraper and Crawler Framework for Golang
Proxy_pool
⭐
11,835
Python爬虫代理IP池(proxy pool)
Newspaper
⭐
10,762
News, full-text, and article metadata extraction in Python 3. Advanced docs:
Examples Of Web Crawlers
⭐
9,874
一些非常有趣的python爬虫例子,对新手比较友好,主要爬取淘宝、天猫、微信、豆瓣、QQ等网站。(Some interesting examples of python crawlers that are friendly to beginners. )
Webmagic
⭐
9,637
A scalable web crawler framework for Java.
Photon
⭐
7,649
Incredibly fast crawler designed for OSINT.
Avbook
⭐
7,605
AV 电影管理系统, avmoo , javbus , javlibrary 爬虫,线上 AV 影片图书馆,AV 磁力链接数据库,Japanese Adult Video Library,Adult Video Magnet Links - Japanese Adult Video Database
Crawlab
⭐
7,567
Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架
Python
⭐
6,454
Python脚本。模拟登录知乎, 爬虫,操作excel,微信公众号,远程开机
Node Crawler
⭐
5,780
Web Crawler/Spider for NodeJS + server-side jQuery ;-)
Wechatsogou
⭐
4,935
基于搜狗微信搜索的微信公众号爬虫接口
Headless Chrome Crawler
⭐
4,923
Distributed crawler powered by Headless Chrome
Scrapy Redis
⭐
4,798
Redis-based components for Scrapy.
Haipproxy
⭐
4,740
💖 High available distributed ip proxy pool, powerd by Scrapy and Redis
Awesome Crawler
⭐
4,462
A collection of awesome web crawler,spider in different languages
Ferret
⭐
4,409
Declarative web scraping
Autoscraper
⭐
3,359
A Smart, Automatic, Fast and Lightweight Web Scraper for Python
Dom Crawler
⭐
3,252
The DomCrawler component eases DOM navigation for HTML and XML documents.
Scylla
⭐
3,174
Intelligent proxy pool for Humans™ (Maintainer needed)
Toapi
⭐
3,109
Every web site provides APIs.
Dotnetspider
⭐
2,910
DotnetSpider, a .NET standard web crawling library. It is lightweight, efficient and fast high-level web crawling & scraping framework
Arachni
⭐
2,840
Web Application Security Scanner Framework
Ecommercecrawlers
⭐
2,498
实战🐍多种网站、电商数据爬虫🕷。包含🕸:淘宝商品、微信公众号、大众点评、企查查、招聘网站、闲鱼、阿里任务、博客园、微博、百度贴吧、豆瓣电影、包图网、全景网、豆瓣音乐、某省药监局、搜狐新闻、机器学习文本采集、fofa资产采集、汽车之家、国家统计局、百度关键词收录数、蜘蛛泛目录、今日头条、豆瓣影评、携程、小米应用商店、安居客、途家民宿❤️❤️❤️。微信爬虫展示项目:
Proxybroker
⭐
2,414
Proxy [Finder | Checker | Server]. HTTP(S) & SOCKS 🎭
Googlescraper
⭐
2,250
A Python module to scrape several search engines (like Google, Yandex, Bing, Duckduckgo, ...). Including asynchronous networking support.
Querylist
⭐
2,240
🕷 The progressive PHP crawler framework! 优雅的渐进式PHP采集框架。
Gecco
⭐
2,157
Easy to use lightweight web crawler(易用的轻量化网络爬虫)
Lianjia Beike Spider
⭐
2,086
链家网和贝壳网房价爬虫,采集北京上海广州深圳等21个中国主要城市的房价数据(小区,二手房,出租房,新房),稳定可靠快速!支持csv,MySQL, MongoDB,Excel, json存储,支持Python2和3,图表展示数据,注释丰富 🚁,点星支持,仅供学习参考,请勿用于商业用途
Crawler_illegal_cases_in_china
⭐
2,047
Collection of China illegal cases about web crawler 本项目用来整理所有中国大陆爬虫开发者涉诉与违规相关的新闻、资料与法律法规。致力于帮助在中国大陆工作的爬虫行业从业者了解我国相关法律,避免触碰数据合规红线。 [AD]中文知识图谱门户
Gain
⭐
1,992
Web crawling framework based on asyncio.
Gocrawl
⭐
1,903
Polite, slim and concurrent web crawler.
Abot
⭐
1,856
Cross Platform C# web crawler framework built for speed and flexibility. Please star this project! +1.
Instagram Scraper
⭐
1,841
scrapes medias, likes, followers, tags and all metadata. Inspired by instagram-php-scraper,bot
Crawler
⭐
1,835
An easy to use, powerful crawler implemented in PHP. Can execute Javascript.
Dxy Covid 19 Crawler
⭐
1,791
2019新型冠状病毒疫情实时爬虫及API | COVID-19/2019-nCoV Realtime Infection Crawler and API
Python3 Spider
⭐
1,786
Python爬虫实战 - 模拟登陆各大网站 包含但不限于:滑块验证、拼多多、美团、百度、bilibili、大众点评、淘宝,如果喜欢请start ❤️
Rendora
⭐
1,770
dynamic server-side rendering using headless Chrome to effortlessly solve the SEO problem for modern javascript websites
Go_spider
⭐
1,697
[爬虫框架 (golang)] An awesome Go concurrent Crawler(spider) framework. The crawler is flexible and modular. It can be expanded to an Individualized crawler easily or you can use the default crawl components only.
Red_hawk
⭐
1,624
All in one tool for Information Gathering, Vulnerability Scanning and Crawling. A must have tool for all penetration testers
Pspider
⭐
1,538
简单易用的Python爬虫框架,QQ交流群:597510560
Decryptlogin
⭐
1,461
APIs for loginning some websites using requests.
Google Play Scraper
⭐
1,433
Node.js scraper to get data from Google Play
Crawler Detect
⭐
1,400
🕷 CrawlerDetect is a PHP class for detecting bots/crawlers/spiders via the user agent
Ruia
⭐
1,360
Async Python 3.6+ web scraping micro-framework based on asyncio
Lightcrawler
⭐
1,332
Crawl a website and run it through Google lighthouse
Scrapoxy
⭐
1,316
Scrapoxy hides your scraper behind a cloud. It starts a pool of proxies to send your requests. Now, you can crawl without thinking about blacklisting!
Geziyor
⭐
1,241
Geziyor, a fast web crawling & scraping framework for Go. Supports JS rendering.
Wombat
⭐
1,220
Lightweight Ruby web crawler/scraper with an elegant DSL which extracts structured data from pages.
Swiftlinkpreview
⭐
1,213
It makes a preview from an URL, grabbing all the information such as title, relevant texts and images.
Work_crawler
⭐
1,201
Download comics novels 小说漫画下载工具 小説漫画のダウンローダ 小說漫畫下載:腾讯漫画 大角虫漫画 有妖气 知音漫客 咪咕 SF漫画 哦漫画 看漫画 漫画柜 汗汗酷漫 動漫伊甸園 快看漫画 微博动漫 733动漫网 大古漫画网 漫画DB 無限動漫 動漫狂 卡推漫画 动漫之家 动漫屋 古风漫画网 36漫画网 亲亲漫画网 乙女漫画 comico webtoons 咚漫 ニコニコ静画 ComicWalker ヤングエースUP モアイ pixivコミック サイコミ;アルファポリス カクヨム ハーメルン 小説家になろう 起点中文网 八一中文网 顶点小说 落霞小说网 努努书坊 笔趣阁→epub.
Jd Autobuy
⭐
1,169
Python爬虫,京东自动登录,在线抢购商品
Tumblr Crawler
⭐
1,116
Easily download all the photos/videos from tumblr blogs. 下载指定的 Tumblr 博客中的图片,视频
Beanbun
⭐
1,094
Beanbun 是用 PHP 编写的多进程网络爬虫框架,具有良好的开放性、高可扩展性,基于 Workerman。
Crawlergo
⭐
1,070
A powerful dynamic crawler for web vulnerability scanners
Pixeval
⭐
1,013
A Strong, Fast and Flexible Pixiv Client based on .NET Core and WPF
Vulnx
⭐
1,002
vulnx 🕷️ is an intelligent bot auto shell injector that detect vulnerabilities in multiple types of cms { `wordpress , joomla , drupal , prestashop .. `}
Weibo Crawler
⭐
998
新浪微博爬虫,用python爬取新浪微博数据,并下载微博图片和微博视频
Dirhunt
⭐
975
Find web directories without bruteforce
Diskover
⭐
971
File system crawler, disk space usage, file search engine and file system analytics powered by Elasticsearch
News Please
⭐
962
news-please - an integrated web crawler and information extractor for news that just works.
Autocrawler
⭐
954
Google, Naver multiprocess image web crawler (Selenium)
Tumblthree
⭐
923
A Tumblr Blog Backup Application
Appcrawler
⭐
919
基于appium的app自动遍历工具
Mzitu
⭐
915
👧 美女写真套图爬虫(二)
Fscrawler
⭐
900
Elasticsearch File System Crawler (FS Crawler)
Zhihu Crawler
⭐
890
zhihu-crawler是一个基于Java的高性能、支持免费http代理池、支持横向扩展、分布式爬虫项目
Finalrecon
⭐
877
The Last Web Recon Tool You'll Need
Pic Gather
⭐
845
[ Closed ] 🎨 image collector, which supports custom acquisition source configuration and is compatible with MacOS and Windows operating systems.
Sqliv
⭐
835
massive SQL injection vulnerability scanner
Torbot
⭐
808
Dark Web OSINT Tool
Instagram Profilecrawl
⭐
806
📝 quickly crawl the information (e.g. followers, tags etc...) of an instagram profile.
Lulu
⭐
789
[Unmaintained] A simple and clean video/music/image downloader 👾
Crawler
⭐
780
A high performance web crawler in Elixir.
Pxer
⭐
770
A tool for pixiv.net. 人人可用的P站爬虫
Gospider
⭐
765
Gospider - Fast web spider written in Go
Creeper
⭐
762
🐾 Creeper - The Next Generation Crawler Framework (Go)
Fetchbot
⭐
752
A simple and flexible web crawler that follows the robots.txt policies and crawl delays.
Magnet Dht
⭐
686
✌️ Python3 BitTorrent DHT crawler
Grab Site
⭐
678
The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
Xalpha
⭐
673
基金投资管理回测引擎
Spidr
⭐
653
A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.
Price Monitor
⭐
634
京东商品价格监控:监控用户设定商品价格,降价邮件/微信提醒。技术:Python爬虫/IP代理池/JS接口爬取/Selenium页面爬取
Icrawler
⭐
619
A multi-thread crawler framework with many builtin image crawlers provided.
Course Crawler
⭐
605
🎓 中国大学MOOC、学堂在线、网易云课堂、好大学在线、爱课程 MOOC 课程下载。
Newcrawler
⭐
589
Free Web Scraping Tool with Java
Baiduimagespider
⭐
585
一个超级轻量的百度图片爬虫
Easy Scraping Tutorial
⭐
583
Simple but useful Python web scraping tutorial code.
Douyin
⭐
581
API of DouYin for Humans used to Crawl Popular Videos and Musics
Filemasta
⭐
569
A search application to explore, discover and share online files
Netdiscovery
⭐
569
NetDiscovery 是一款基于 Vert.x、RxJava 2 等框架实现的通用爬虫框架/中间件。
Fess
⭐
554
Fess is very powerful and easily deployable Enterprise Search Server.
Xxl Crawler
⭐
553
A distributed web crawler framework.(分布式爬虫框架XXL-CRAWLER)
Fbcrawl
⭐
532
A Facebook crawler
Xsrfprobe
⭐
527
The Prime Cross Site Request Forgery (CSRF) Audit and Exploitation Toolkit.
Pyptt
⭐
521
支援 PTT 還有 PTT2 的 PTT API
Go_jobs
⭐
515
带你了解一下Golang的市场行情
Scan T
⭐
503
a new crawler based on python with more function including Network fingerprint search
1-100 of 354 projects
Next >
Advertising
📦 10
All Projects
Application Programming Interfaces
📦 124
Applications
📦 192
Artificial Intelligence
📦 78
Blockchain
📦 73
Build Tools
📦 113
Cloud Computing
📦 80
Code Quality
📦 28
Collaboration
📦 32
Command Line Interface
📦 49
Community
📦 83
Companies
📦 60
Compilers
📦 63
Computer Science
📦 80
Configuration Management
📦 42
Content Management
📦 175
Control Flow
📦 213
Data Formats
📦 78
Data Processing
📦 276
Data Storage
📦 135
Economics
📦 64
Frameworks
📦 215
Games
📦 129
Graphics
📦 110
Hardware
📦 152
Integrated Development Environments
📦 49
Learning Resources
📦 166
Legal
📦 29
Libraries
📦 129
Lists Of Projects
📦 22
Machine Learning
📦 347
Mapping
📦 64
Marketing
📦 15
Mathematics
📦 55
Media
📦 239
Messaging
📦 98
Networking
📦 315
Operating Systems
📦 89
Operations
📦 121
Package Managers
📦 55
Programming Languages
📦 245
Runtime Environments
📦 100
Science
📦 42
Security
📦 396
Social Media
📦 27
Software Architecture
📦 72
Software Development
📦 72
Software Performance
📦 58
Software Quality
📦 133
Text Editors
📦 49
Text Processing
📦 136
User Interface
📦 330
User Interface Components
📦 514
Version Control
📦 30
Virtualization
📦 71
Web Browsers
📦 42
Web Servers
📦 26
Web User Interface
📦 210