Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Crawlab | 10,521 | 4 months ago | 1 | March 03, 2019 | 58 | bsd-3-clause | Go | |||
Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架 | ||||||||||
Distribute_crawler | 3,176 | 7 years ago | 26 | Python | ||||||
使用scrapy,redis, mongodb,graphite实现的一个分布式网络爬虫,底层存储mongodb集群,分布式使用redis实现,爬虫状态显示使用graphite实现 | ||||||||||
Jd_spider | 728 | 5 years ago | 2 | Python | ||||||
Two dumb distributed crawlers | ||||||||||
Python Spider | 680 | 2 years ago | apache-2.0 | Python | ||||||
豆瓣电影top250、斗鱼爬取json数据以及爬取美女图片、淘宝、有缘、CrawlSpider爬取红娘网相亲人的部分基本信息以及红娘网分布式爬取和存储redis、爬虫小demo、Selenium、爬取多点、django开发接口、爬取有缘网信息、模拟知乎登录、模拟github登录、模拟图虫网登录、爬取多点商城整站数据、爬取微信公众号历史文章、爬取微信群或者微信好友分享的文章、itchat监听指定微信公众号分享的文章 | ||||||||||
Web_kg | 435 | 4 years ago | 9 | Python | ||||||
爬取百度百科中文页面,抽取三元组信息,构建中文知识图谱 | ||||||||||
Spider | 356 | 5 years ago | 8 | Python | ||||||
爬虫实例:微博、b站、csdn、淘宝、今日头条、知乎、豆瓣、知乎APP、大众点评 | ||||||||||
Scrapy Mongodb | 327 | 23 | 6 years ago | 22 | January 08, 2018 | 6 | other | Python | ||
MongoDB pipeline for Scrapy. This module supports both MongoDB in standalone setups and replica sets. scrapy-mongodb will insert the items to MongoDB as soon as your spider finds data to extract. | ||||||||||
Findtrip | 324 | 8 years ago | 1 | Python | ||||||
机票爬虫(去哪儿和携程网)。flight tickets multiple webspider.(scrapy + selenium + phantomjs + mongodb) | ||||||||||
Data Engineering Projects | 322 | a year ago | 5 | Jupyter Notebook | ||||||
Personal Data Engineering Projects | ||||||||||
Pigat | 187 | 2 years ago | 1 | Python | ||||||
pigat ( Passive Intelligence Gathering Aggregation Tool ) 被动信息收集聚合工具 |