Awesome Open Source
Awesome Open Source
Combined Topics
spider
x
Advertising
📦 10
All Projects
Application Programming Interfaces
📦 124
Applications
📦 192
Artificial Intelligence
📦 78
Blockchain
📦 73
Build Tools
📦 113
Cloud Computing
📦 80
Code Quality
📦 28
Collaboration
📦 32
Command Line Interface
📦 49
Community
📦 83
Companies
📦 60
Compilers
📦 63
Computer Science
📦 80
Configuration Management
📦 42
Content Management
📦 175
Control Flow
📦 213
Data Formats
📦 78
Data Processing
📦 276
Data Storage
📦 135
Economics
📦 64
Frameworks
📦 215
Games
📦 129
Graphics
📦 110
Hardware
📦 152
Integrated Development Environments
📦 49
Learning Resources
📦 166
Legal
📦 29
Libraries
📦 129
Lists Of Projects
📦 22
Machine Learning
📦 347
Mapping
📦 64
Marketing
📦 15
Mathematics
📦 55
Media
📦 239
Messaging
📦 98
Networking
📦 315
Operating Systems
📦 89
Operations
📦 121
Package Managers
📦 55
Programming Languages
📦 245
Runtime Environments
📦 100
Science
📦 42
Security
📦 396
Social Media
📦 27
Software Architecture
📦 72
Software Development
📦 72
Software Performance
📦 58
Software Quality
📦 133
Text Editors
📦 49
Text Processing
📦 136
User Interface
📦 330
User Interface Components
📦 514
Version Control
📦 30
Virtualization
📦 71
Web Browsers
📦 42
Web Servers
📦 26
Web User Interface
📦 210
The Top 246 Spider Open Source Projects
Categories
>
Data Processing
>
Spider
Awesome Spider
⭐
14,707
爬虫集合
Colly
⭐
12,872
Elegant Scraper and Crawler Framework for Golang
Proxy_pool
⭐
11,531
Python爬虫代理IP池(proxy pool)
Examples Of Web Crawlers
⭐
9,664
一些非常有趣的python爬虫例子,对新手比较友好,主要爬取淘宝、天猫、微信、豆瓣、QQ等网站。(Some interesting examples of python crawlers that are friendly to beginners. )
Photon
⭐
7,521
Incredibly fast crawler designed for OSINT.
Avbook
⭐
7,484
AV 电影管理系统, avmoo , javbus , javlibrary 爬虫,线上 AV 影片图书馆,AV 磁力链接数据库,Japanese Adult Video Library,Adult Video Magnet Links - Japanese Adult Video Database
Crawlab
⭐
7,436
Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架
Pholcus
⭐
6,694
Pholcus is a distributed high-concurrency crawler software written in pure golang
Anti Anti Spider
⭐
6,582
越来越多的网站具有反爬虫特性,有的用图片隐藏关键数据,有的使用反人类的验证码,建立反反爬虫的代码仓库,通过与不同特性的网站做斗争(无恶意)提高技术。(欢迎提交难以采集的网站)(因工作原因,项目暂停)
Node Crawler
⭐
5,714
Web Crawler/Spider for NodeJS + server-side jQuery ;-)
Haipproxy
⭐
4,701
💖 High available distributed ip proxy pool, powerd by Scrapy and Redis
Awesome Crawler
⭐
4,403
A collection of awesome web crawler,spider in different languages
Infospider
⭐
4,162
INFO-SPIDER 是一个集众多数据源于一身的爬虫工具箱🧰,旨在安全快捷的帮助用户拿回自己的数据,工具代码开源,流程透明。支持数据源包括GitHub、QQ邮箱、网易邮箱、阿里邮箱、新浪邮箱、Hotmail邮箱、Outlook邮箱、京东、淘宝、支付宝、中国移动、中国联通、中国电信、知乎、哔哩哔哩、网易云音乐、QQ好友、QQ群、生成朋友圈相册、浏览器浏览历史、12306、博客园、CSDN博客、开源中国博客、简书。
Toplist
⭐
3,875
今日热榜,一个获取各大热门网站热门头条的聚合网站,使用Go语言编写,多协程异步快速抓取信息,预览:https://mo.fish
Toapi
⭐
3,087
Every web site provides APIs.
Core
⭐
2,532
🔞 JAVClub - 让你的大姐姐不再走丢
Spiderkeeper
⭐
2,463
admin ui for scrapy/open source scrapinghub
Dht
⭐
2,335
BitTorrent DHT Protocol && DHT Spider.
Gerapy
⭐
2,301
Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Django and Vue.js
Querylist
⭐
2,216
🕷 The progressive PHP crawler framework! 优雅的渐进式PHP采集框架。
Fiction_house
⭐
2,181
小说精品屋是一个多平台(web、安卓app、微信小程序)、功能完善的屏幕自适应小说漫画连载系统,包含精品小说专区、轻小说专区和漫画专区。包括小说/漫画分类、小说/漫画搜索、小说/漫画排行、完本小说/漫画、小说/漫画评分、小说/漫画在线阅读、小说/漫画书架、小说/漫画阅读记录、小说下载、小说弹幕、小说/漫画自动采集/更新/纠错、小说内容自动分享到微博、邮件自动推广、链接自动推送到百度搜索引擎等功能。
Scrapydweb
⭐
2,082
Web app for Scrapyd cluster management, Scrapy log analysis & visualization, Auto packaging, Timer tasks, Monitor & Alert, and Mobile UI. DEMO 👉
Grab
⭐
2,041
Web Scraping Framework
Lianjia Beike Spider
⭐
2,029
链家网和贝壳网房价爬虫,采集北京上海广州深圳等21个中国主要城市的房价数据(小区,二手房,出租房,新房),稳定可靠快速!支持csv,MySQL, MongoDB,Excel, json存储,支持Python2和3,图表展示数据,注释丰富 🚁,点星支持
Owllook
⭐
2,013
owllook-小说搜索引擎
Gain
⭐
1,988
Web crawling framework based on asyncio.
Abot
⭐
1,833
Cross Platform C# web crawler framework built for speed and flexibility. Please star this project! +1.
Python3 Spider
⭐
1,721
Python爬虫实战 - 模拟登陆各大网站 包含但不限于:滑块验证、拼多多、美团、百度、bilibili、大众点评、淘宝,如果喜欢请start ❤️
Go_spider
⭐
1,692
[爬虫框架 (golang)] An awesome Go concurrent Crawler(spider) framework. The crawler is flexible and modular. It can be expanded to an Individualized crawler easily or you can use the default crawl components only.
Pspider
⭐
1,519
简单易用的Python爬虫框架,QQ交流群:597510560
Crawler Detect
⭐
1,373
🕷 CrawlerDetect is a PHP class for detecting bots/crawlers/spiders via the user agent
Ruia
⭐
1,335
Async Python 3.6+ web scraping micro-framework based on asyncio
Decryptlogin
⭐
1,335
APIs for loginning some websites using requests.
Geziyor
⭐
1,220
Geziyor, a fast web crawling & scraping framework for Go. Supports JS rendering.
Antcolony
⭐
1,127
Nodejs实现的一个磁力链接爬虫 http://findit.keenwon.com (原域名http://findit.so )
Image Downloader
⭐
1,115
Download images from Google, Bing, Baidu. 谷歌、百度、必应图片下载.
Beanbun
⭐
1,082
Beanbun 是用 PHP 编写的多进程网络爬虫框架,具有良好的开放性、高可扩展性,基于 Workerman。
Glyphhanger
⭐
1,045
Your web font utility belt. It can subset web fonts. It can find unicode-ranges for you automatically. It makes julienne fries.
Django Dynamic Scraper
⭐
1,014
Creating Scrapy scrapers via the Django admin interface
Reptile
⭐
995
🏀 Python3 网络爬虫实战(部分含详细教程)猫眼 腾讯视频 豆瓣 研招网 微博 笔趣阁小说 百度热点 B站 CSDN 网易云阅读 阿里文学 百度股票 今日头条 微信公众号 网易云音乐 拉勾 有道 unsplash 实习僧 汽车之家 英雄联盟盒子 大众点评 链家 LPL赛程 台风 梦幻西游、阴阳师藏宝阁 天气 牛客网 百度文库 睡前故事 知乎 Wish
Novel Plus
⭐
968
小说精品屋-plus是一个多端(PC、WAP)阅读、功能完善的原创文学CMS系统,由前台门户系统、作家后台管理系统、平台后台管理系统、爬虫管理系统等多个子系统构成,支持多模版、会员充值、订阅模式、新闻发布和实时统计报表等功能,新书自动入库,老书自动更新。
Spider
⭐
942
A configurable web spider with a easy-to-use web console
Baiduyunspider
⭐
889
百度云网盘搜索引擎,包含爬虫 & 网站
Jspider
⭐
889
JSpider会每周更新至少一个网站的JS解密方式,欢迎 Star,交流微信:13298307816
Zhihu Crawler
⭐
885
zhihu-crawler是一个基于Java的高性能、支持免费http代理池、支持横向扩展、分布式爬虫项目
Wechat_articles_spider
⭐
879
微信公众号文章的爬虫
Blackwidow
⭐
863
A Python based web application scanner to gather OSINT and fuzz for OWASP vulnerabilities on a target website.
Go Demo
⭐
787
Go语言实例教程从入门到进阶,包括基础库使用、设计模式、面试易错点、工具类、对接第三方等
Torbot
⭐
770
Dark Web OSINT Tool
Crawler
⭐
769
A high performance web crawler in Elixir.
Creeper
⭐
762
🐾 Creeper - The Next Generation Crawler Framework (Go)
Funpyspidersearchengine
⭐
759
Word2vec 千人千面 个性化搜索 + Scrapy2.3.0(爬取数据) + ElasticSearch7.9.1(存储数据并提供对外Restful API) + Django3.1.1 搜索
Gospider
⭐
692
Gospider - Fast web spider written in Go
Grab Site
⭐
659
The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
Querido Diario
⭐
656
📰 Brazilian government gazettes, accessible to everyone.
Spidr
⭐
649
A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.
Oneblog
⭐
649
👽 OneBlog,一个简洁美观、功能强大并且自适应的Java博客
Darknet_chinesetrading
⭐
630
🚇暗网中文网监控爬虫(DEEPMIX)
Icrawler
⭐
615
A multi-thread crawler framework with many builtin image crawlers provided.
Python Spider
⭐
605
豆瓣电影top250、斗鱼爬取json数据以及爬取美女图片、淘宝、有缘、CrawlSpider爬取红娘网相亲人的部分基本信息以及红娘网分布式爬取和存储redis、爬虫小demo、Selenium、爬取多点、django开发接口、爬取有缘网信息、模拟知乎登录、模拟github登录、模拟图虫网登录、爬取多点商城整站数据、爬取微信公众号历史文章、爬取微信群或者微信好友分享的文章、itchat监听指定微信公众号分享的文章
Istock
⭐
591
👉一个基于spring boot 实现的java股票爬虫(仅支持A股),如果你❤️请⭐️ . V2升级版正在开发中!
Newcrawler
⭐
584
Free Web Scraping Tool with Java
Douyin
⭐
578
API of DouYin for Humans used to Crawl Popular Videos and Musics
Spider163
⭐
564
抓取网易云音乐热门评论
Netdiscovery
⭐
564
NetDiscovery 是一款基于 Vert.x、RxJava 2 等框架实现的通用爬虫框架/中间件。
Baiduimagespider
⭐
563
一个超级轻量的百度图片爬虫
Domain_hunter
⭐
561
A Burp Suite Extension that try to find all sub-domain, similar-domain and related-domain of an organization automatically! 基于流量自动收集整个企业或组织的子域名、相似域名、相关域名的burp插件
91porn_php
⭐
547
最简单的91porn爬虫php版本
Xxl Crawler
⭐
545
A distributed web crawler framework.(分布式爬虫框架XXL-CRAWLER)
Web_kg
⭐
535
爬取百度百科中文页面,抽取三元组信息,构建中文知识图谱
Go_jobs
⭐
512
带你了解一下Golang的市场行情
Fbcrawl
⭐
511
A Facebook crawler
Tumblr_spider
⭐
453
汤不热 python 多线程爬虫
Anti Webspider
⭐
452
Web 端反爬技术方案
Movieheavens
⭐
446
🎬 基于Pyqt5的简单电影搜索工具
Html2article
⭐
439
Html网页正文提取
Learnpython
⭐
437
Python的基础练习代码与各种爬虫代码
Xsrfprobe
⭐
431
The Prime Cross Site Request Forgery (CSRF) Audit and Exploitation Toolkit.
Qqzonemood
⭐
429
QQZone mood spider and analysis. QQ空间多线程爬虫和数据挖掘。提供线上服务,扫码登陆即可自动爬取和分析数据,还有网易云年度报告风格的数据展示;使用docker-compose打包程序,方便部署;额外提供QQ空间抽奖小程序。
Qzoneexport
⭐
407
QQ空间导出助手,用于备份QQ空间的说说、日志、私密日记、相册、视频、留言板、QQ好友、收藏夹、分享为文件,便于迁移与保存
Zhihu
⭐
403
✨ 知乎日报 - 爬虫、数据分析、Node.js、Vue.js ...
Bili Spider
⭐
401
📺 B 站全站视频信息爬虫
Gosint
⭐
397
OSINT Swiss Army Knife
Jdpackage
⭐
389
跨平台的京东全能工具包 仅供学习使用,技术交流群:108934299
Alipayorderssupervisor
⭐
383
✨ 使用Node监视支付宝订单,即时通知服务器以实现免签约支付接口
Crawly
⭐
380
Crawly, a high-level web crawling & scraping framework for Elixir.
Qzoneexporter
⭐
378
QQ空间爬虫,可导出并显示日志、相册、留言板、说说、照片、视频等数据。
Signature_algorithm
⭐
373
各种App、小程序、网站的请求签名或加密算法。 现已有:自如、小红书、蛋壳公寓、luckin coffee(瑞幸咖啡)、bangkokair(曼谷航空)
Bdp Dataplatform
⭐
363
bdp-dataplatform:基于大数据、数据平台、微服务、机器学习、商城、自动化运维、DevOps、容器部署平台、数据平台采集、数据平台存储、数据平台计算、数据平台开发、数据平台应用搭建的大数据解决方案。
Kindlebookmaker
⭐
358
Kindle Book Maker with KindleGen, Make Book from RSS/single URL/directory and so on.
Webster
⭐
356
a reliable high-level web crawling & scraping framework for Node.js.
Fictiondown
⭐
352
小说下载|小说爬取|起点|笔趣阁|导出Markdown|导出txt|转换epub|广告过滤|自动校对
Xcrawler
⭐
345
快速、简洁且强大的PHP爬虫框架
Freshonions Torscraper
⭐
342
Fresh Onions is an open source TOR spider / hidden service onion crawler hosted at zlal32teyptf4tvi.onion
Weatherspider
⭐
338
天气爬虫(全国城镇天气自动定时抓取更新,并开放RESTful查询接口),附带代理IP池定时更新并检测其可用性
Zhihu Login
⭐
336
知乎模拟登录,支持提取验证码和保存 Cookies
Webspider
⭐
334
在线地址: http://119.23.223.90:8000
Celerystalk
⭐
330
An asynchronous enumeration & vulnerability scanner. Run all the tools on all the hosts.
Ttbot
⭐
325
今日头条机器人,支持用户登陆、关注、取消关注、获取关注粉丝、发文、发悟空问答、点赞、评论、采集各种类型新闻讯息等,使用今日头条网页版API实现
Spiders
⭐
321
Python爬虫,返回一定格式的信息,下载,使用flask提供简易api。抖音无水印、皮皮虾、快手、网易云音乐、qq音乐、咪咕音乐、荔枝FM音频、知乎视频、最右语音、视频、微博......
1-100 of 246 projects
Next >
Advertising
📦 10
All Projects
Application Programming Interfaces
📦 124
Applications
📦 192
Artificial Intelligence
📦 78
Blockchain
📦 73
Build Tools
📦 113
Cloud Computing
📦 80
Code Quality
📦 28
Collaboration
📦 32
Command Line Interface
📦 49
Community
📦 83
Companies
📦 60
Compilers
📦 63
Computer Science
📦 80
Configuration Management
📦 42
Content Management
📦 175
Control Flow
📦 213
Data Formats
📦 78
Data Processing
📦 276
Data Storage
📦 135
Economics
📦 64
Frameworks
📦 215
Games
📦 129
Graphics
📦 110
Hardware
📦 152
Integrated Development Environments
📦 49
Learning Resources
📦 166
Legal
📦 29
Libraries
📦 129
Lists Of Projects
📦 22
Machine Learning
📦 347
Mapping
📦 64
Marketing
📦 15
Mathematics
📦 55
Media
📦 239
Messaging
📦 98
Networking
📦 315
Operating Systems
📦 89
Operations
📦 121
Package Managers
📦 55
Programming Languages
📦 245
Runtime Environments
📦 100
Science
📦 42
Security
📦 396
Social Media
📦 27
Software Architecture
📦 72
Software Development
📦 72
Software Performance
📦 58
Software Quality
📦 133
Text Editors
📦 49
Text Processing
📦 136
User Interface
📦 330
User Interface Components
📦 514
Version Control
📦 30
Virtualization
📦 71
Web Browsers
📦 42
Web Servers
📦 26
Web User Interface
📦 210