Awesome Open Source
Awesome Open Source
Combined Topics
crawling
x
Advertising
📦 10
All Projects
Application Programming Interfaces
📦 124
Applications
📦 192
Artificial Intelligence
📦 78
Blockchain
📦 73
Build Tools
📦 113
Cloud Computing
📦 80
Code Quality
📦 28
Collaboration
📦 32
Command Line Interface
📦 49
Community
📦 83
Companies
📦 60
Compilers
📦 63
Computer Science
📦 80
Configuration Management
📦 42
Content Management
📦 175
Control Flow
📦 213
Data Formats
📦 78
Data Processing
📦 276
Data Storage
📦 135
Economics
📦 64
Frameworks
📦 215
Games
📦 129
Graphics
📦 110
Hardware
📦 152
Integrated Development Environments
📦 49
Learning Resources
📦 166
Legal
📦 29
Libraries
📦 129
Lists Of Projects
📦 22
Machine Learning
📦 347
Mapping
📦 64
Marketing
📦 15
Mathematics
📦 55
Media
📦 239
Messaging
📦 98
Networking
📦 315
Operating Systems
📦 89
Operations
📦 121
Package Managers
📦 55
Programming Languages
📦 245
Runtime Environments
📦 100
Science
📦 42
Security
📦 396
Social Media
📦 27
Software Architecture
📦 72
Software Development
📦 72
Software Performance
📦 58
Software Quality
📦 133
Text Editors
📦 49
Text Processing
📦 136
User Interface
📦 330
User Interface Components
📦 514
Version Control
📦 30
Virtualization
📦 71
Web Browsers
📦 42
Web Servers
📦 26
Web User Interface
📦 210
The Top 38 Crawling Open Source Projects
Categories
>
Data Processing
>
Crawling
Scrapy
⭐
39,500
Scrapy, a fast high-level web crawling & scraping framework for Python.
Colly
⭐
12,896
Elegant Scraper and Crawler Framework for Golang
Newspaper
⭐
10,611
News, full-text, and article metadata extraction in Python 3. Advanced docs:
Headless Chrome Crawler
⭐
4,879
Distributed crawler powered by Headless Chrome
Ferret
⭐
4,373
Declarative web scraping
Apify Js
⭐
2,693
Apify SDK — The scalable web scraping and crawling library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
Nutch
⭐
2,163
Apache Nutch is an extensible and scalable web crawler
Awesome Puppeteer
⭐
1,495
A curated list of awesome puppeteer resources.
Lulu
⭐
787
[Unmaintained] A simple and clean video/music/image downloader 👾
Easy Scraping Tutorial
⭐
567
Simple but useful Python web scraping tutorial code.
Scrapy Selenium
⭐
515
Scrapy middleware to handle javascript pages using selenium
Dataflowkit
⭐
447
Extract structured data from web sites. Web sites scraping.
Isp Data Pollution
⭐
422
ISP Data Pollution to Protect Private Browsing History with Obfuscation
Crawly
⭐
383
Crawly, a high-level web crawling & scraping framework for Elixir.
Webster
⭐
357
a reliable high-level web crawling & scraping framework for Node.js.
Spidermon
⭐
299
Scrapy Extension for monitoring spiders execution.
Sasila
⭐
283
一个灵活、友好的爬虫框架
Gopa
⭐
277
[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn
Stopstalk Deployment
⭐
267
Stop stalking and start StopStalking 😉
Spidy
⭐
255
The simple, easy to use command line web crawler.
Memorious
⭐
244
Distributed crawling framework for documents and structured data.
Cdp4j
⭐
232
cdp4j - Chrome DevTools Protocol for Java
Antch
⭐
193
Antch, a fast, powerful and extensible web crawling & scraping framework for Go
N2h4
⭐
173
네이버 뉴스 수집을 위한 도구
Linkedin Profile Scraper
⭐
150
🕵️♂️ LinkedIn profile scraper returning structured profile data in JSON. Works in 2020.
Crawler
⭐
147
Go process used to crawl websites
Massivedl
⭐
137
Download a large list of files concurrently
Holiday Cn
⭐
135
📅🇨🇳 中国法定节假日数据 自动每日抓取国务院公告
Squidwarc
⭐
122
Squidwarc is a high fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a head
Corpuscrawler
⭐
120
Crawler for linguistic corpora
Dotnetcrawler
⭐
96
DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c
Instagram Bot
⭐
92
An Instagram bot developed using the Selenium Framework
Grawler
⭐
80
Grawler is a tool written in PHP which comes with a web interface that automates the task of using google dorks, scrapes the results, and stores them in a file.
Dig Etl Engine
⭐
77
Download DIG to run on your laptop or server.
Arachnid
⭐
68
Powerful web scraping framework for Crystal
Python Crawling Tutorial
⭐
57
Python crawling tutorial
Crawling Projects
⭐
47
Web scraping and automation using python
Pdf_downloader
⭐
17
A Scrapy Spider for downloading PDF files from a webpage.
1-38 of 38 projects
Advertising
📦 10
All Projects
Application Programming Interfaces
📦 124
Applications
📦 192
Artificial Intelligence
📦 78
Blockchain
📦 73
Build Tools
📦 113
Cloud Computing
📦 80
Code Quality
📦 28
Collaboration
📦 32
Command Line Interface
📦 49
Community
📦 83
Companies
📦 60
Compilers
📦 63
Computer Science
📦 80
Configuration Management
📦 42
Content Management
📦 175
Control Flow
📦 213
Data Formats
📦 78
Data Processing
📦 276
Data Storage
📦 135
Economics
📦 64
Frameworks
📦 215
Games
📦 129
Graphics
📦 110
Hardware
📦 152
Integrated Development Environments
📦 49
Learning Resources
📦 166
Legal
📦 29
Libraries
📦 129
Lists Of Projects
📦 22
Machine Learning
📦 347
Mapping
📦 64
Marketing
📦 15
Mathematics
📦 55
Media
📦 239
Messaging
📦 98
Networking
📦 315
Operating Systems
📦 89
Operations
📦 121
Package Managers
📦 55
Programming Languages
📦 245
Runtime Environments
📦 100
Science
📦 42
Security
📦 396
Social Media
📦 27
Software Architecture
📦 72
Software Development
📦 72
Software Performance
📦 58
Software Quality
📦 133
Text Editors
📦 49
Text Processing
📦 136
User Interface
📦 330
User Interface Components
📦 514
Version Control
📦 30
Virtualization
📦 71
Web Browsers
📦 42
Web Servers
📦 26
Web User Interface
📦 210