Awesome Open Source
Awesome Open Source
Combined Topics
scraper
x
Advertising
📦 10
All Projects
Application Programming Interfaces
📦 124
Applications
📦 192
Artificial Intelligence
📦 78
Blockchain
📦 73
Build Tools
📦 113
Cloud Computing
📦 80
Code Quality
📦 28
Collaboration
📦 32
Command Line Interface
📦 49
Community
📦 83
Companies
📦 60
Compilers
📦 63
Computer Science
📦 80
Configuration Management
📦 42
Content Management
📦 175
Control Flow
📦 213
Data Formats
📦 78
Data Processing
📦 276
Data Storage
📦 135
Economics
📦 64
Frameworks
📦 215
Games
📦 129
Graphics
📦 110
Hardware
📦 152
Integrated Development Environments
📦 49
Learning Resources
📦 166
Legal
📦 29
Libraries
📦 129
Lists Of Projects
📦 22
Machine Learning
📦 347
Mapping
📦 64
Marketing
📦 15
Mathematics
📦 55
Media
📦 239
Messaging
📦 98
Networking
📦 315
Operating Systems
📦 89
Operations
📦 121
Package Managers
📦 55
Programming Languages
📦 245
Runtime Environments
📦 100
Science
📦 42
Security
📦 396
Social Media
📦 27
Software Architecture
📦 72
Software Development
📦 72
Software Performance
📦 58
Software Quality
📦 133
Text Editors
📦 49
Text Processing
📦 136
User Interface
📦 330
User Interface Components
📦 514
Version Control
📦 30
Virtualization
📦 71
Web Browsers
📦 42
Web Servers
📦 26
Web User Interface
📦 210
The Top 178 Scraper Open Source Projects
Categories
>
Data Processing
>
Scraper
Huginn
⭐
30,620
Create agents that monitor and act on your behalf. Your agents are standing by!
Cheerio
⭐
23,213
Fast, flexible, and lean implementation of core jQuery designed specifically for the server.
Annie
⭐
13,903
👾 Fast, simple and clean video downloader
Colly
⭐
12,872
Elegant Scraper and Crawler Framework for Golang
Newspaper
⭐
10,600
News, full-text, and article metadata extraction in Python 3. Advanced docs:
Chinese Xinhua
⭐
7,590
📙 中华新华字典数据库。包括歇后语,成语,词语,汉字。
Avbook
⭐
7,484
AV 电影管理系统, avmoo , javbus , javlibrary 爬虫,线上 AV 影片图书馆,AV 磁力链接数据库,Japanese Adult Video Library,Adult Video Magnet Links - Japanese Adult Video Database
Headless Chrome Crawler
⭐
4,876
Distributed crawler powered by Headless Chrome
Instagram Scraper
⭐
4,585
Scrapes an instagram user's photos and videos
Awesome Crawler
⭐
4,403
A collection of awesome web crawler,spider in different languages
Ferret
⭐
4,369
Declarative web scraping
Scrape It
⭐
3,636
🔮 A Node.js scraper for humans.
Autoscraper
⭐
3,216
A Smart, Automatic, Fast and Lightweight Web Scraper for Python
Weibo_terminater
⭐
2,284
Final Weibo Crawler Scrap Anything From Weibo, comments, weibo contents, followers, anything. The Terminator
Querylist
⭐
2,216
🕷 The progressive PHP crawler framework! 优雅的渐进式PHP采集框架。
Node Ytdl Core
⭐
2,093
YouTube video downloader in javascript.
Instagram Scraper
⭐
1,779
scrapes medias, likes, followers, tags and all metadata. Inspired by instagram-php-scraper,bot
Jobfunnel
⭐
1,439
Scrape job websites into a single spreadsheet with no duplicates.
Google Play Scraper
⭐
1,408
Node.js scraper to get data from Google Play
Scrapoxy
⭐
1,293
Scrapoxy hides your scraper behind a cloud. It starts a pool of proxies to send your requests. Now, you can crawl without thinking about blacklisting!
Rod
⭐
1,283
A Devtools driver for web automation and scraping
Geziyor
⭐
1,220
Geziyor, a fast web crawling & scraping framework for Go. Supports JS rendering.
Wombat
⭐
1,216
Lightweight Ruby web crawler/scraper with an elegant DSL which extracts structured data from pages.
Jd Autobuy
⭐
1,158
Python爬虫,京东自动登录,在线抢购商品
Django Dynamic Scraper
⭐
1,014
Creating Scrapy scrapers via the Django admin interface
Node Website Scraper
⭐
886
Download website to local directory (including all css, images, js, etc.)
Scanless
⭐
857
online port scan scraper
Lulu
⭐
786
[Unmaintained] A simple and clean video/music/image downloader 👾
Crawler
⭐
769
A high performance web crawler in Elixir.
Imdbpy
⭐
761
IMDbPY is a Python package useful to retrieve and manage the data of the IMDb movie database about movies, people, characters and companies
Informer
⭐
727
A Telegram Mass Surveillance Bot in Python
Emby.plugins.javscraper
⭐
678
Emby/Jellyfin 的一个日本电影刮削器插件,可以从某些网站抓取影片信息。
Spidr
⭐
649
A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.
Surgeon
⭐
644
Declarative DOM extraction expression evaluator. 👨⚕️
Instagram Crawler
⭐
622
Get Instagram posts/profile/hashtag data without using Instagram API
Imagescraper
⭐
619
✂️ High performance, multi-threaded image scraper
Scala Scraper
⭐
619
A Scala library for scraping content from HTML pages
Onlyfans
⭐
617
Scrape all the media from an OnlyFans account - Updated regularly
Instagram4j
⭐
605
📷 Instagram private API in Java
Fbcrawl
⭐
511
A Facebook crawler
Jikan
⭐
504
Unofficial MyAnimeList PHP+REST API which provides functions other than the official API
Operative Framework
⭐
493
operative framework is a OSINT investigation framework, you can interact with multiple targets, execute multiple modules, create links with target, export rapport to PDF file, add note to target or results, interact with RESTFul API, write your own modules.
Redditdownloader
⭐
489
Scrapes Reddit to download media of your choice.
Googledictionaryapi
⭐
467
Google does not provide Google Dictionary API so I created one.
Nintendo Switch Eshop
⭐
448
Crawler for Nintendo Switch eShop
Dataflowkit
⭐
446
Extract structured data from web sites. Web sites scraping.
Bookcorpus
⭐
432
Crawl BookCorpus
Scrapedin
⭐
430
LinkedIn Scraper (currently working 2020)
Gosint
⭐
397
OSINT Swiss Army Knife
Advanced Web Scraping Tutorial
⭐
383
The Zipru scraper developed in the Advanced Web Scraping Tutorial.
Finviz
⭐
380
Unofficial API for finviz.com
Crawly
⭐
380
Crawly, a high-level web crawling & scraping framework for Elixir.
Php Goose
⭐
379
Readability / Html Content / Article Extractor & Web Scrapping library written in PHP
Micro Open Graph
⭐
370
A tiny Node.js microservice to scrape open graph data with joy.
Linkedin_scraper
⭐
368
A library that scrapes Linkedin for user data
Finance Go
⭐
362
📊 Financial markets data library implemented in go.
Scrapers
⭐
357
A list of scrapers from around the web.
Snscrape
⭐
350
A social networking service scraper in Python
Xcrawler
⭐
345
快速、简洁且强大的PHP爬虫框架
Freshonions Torscraper
⭐
342
Fresh Onions is an open source TOR spider / hidden service onion crawler hosted at zlal32teyptf4tvi.onion
Osi.ig
⭐
336
Information Gathering Instagram.
Javgo
⭐
335
JavGo是一个集合影片管理,影片刮削,视频处理,资源搜索等综合一体的全功能影音软件,支持爬取javbus,jav321,javdb,javlibrary进行刮削,支持db,bus的磁力搜索,支持获取library的影片评论。
Katana
⭐
329
A Python Tool For google Hacking
Xidel
⭐
324
Command line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern matching. It can also create new or transformed XML/HTML/JSON documents.
Hquery.php
⭐
299
An extremely fast web scraper that parses megabytes of invalid HTML in a blink of an eye. PHP5.3+, no dependencies.
Socialmanagertools Gui
⭐
293
🤖 👻 Desktop application for Instagram Bot, Twitter Bot and Facebook Bot
Linkedin
⭐
291
Linkedin Scraper using Selenium Web Driver, Chromium headless, Docker and Scrapy
Webinspector
⭐
288
Ruby gem to inspect completely a web page. It scrapes a given URL, and returns you its meta, links, images more.
Cryptocmd
⭐
276
Cryptocurrency historical price data library in Python. Data from https://coinmarketcap.com.
Java Spider
⭐
270
一个基于webmagic框架二次开发的java爬虫框架实战,已实现能爬取腾讯,搜狐,今日头条(单独集成功能)等资讯内容,配合elasticsearch框架用法,实现了自动爬虫,已投入线上生产使用。
Rcrawler
⭐
267
An R web crawler and scraper
Weibo_terminator_workflow
⭐
260
Update Version of weibo_terminator, This is Workflow Version aim at Get Job Done!
Heroku_ebooks
⭐
247
A script to generate Markov chains and to post to an _ebooks account on Twitter using Heroku
Polite
⭐
247
Be nice on the web
Instagram Proxy Api
⭐
246
CORS compliant API to access Instagram's public data
Getsy
⭐
237
A simple browser/client-side web scraper.
Urs
⭐
223
Universal Reddit Scraper - Scrape Subreddits, Redditors, and submission comments. A command-line tool written in Python (PRAW).
Scrape Linkedin Selenium
⭐
220
`scrape_linkedin` is a python package that allows you to scrape personal LinkedIn profiles & company pages - turning the data into structured json.
Ruiji.net
⭐
219
crawler framework, distributed crawler extractor
Scrapysharp
⭐
215
reborn of https://bitbucket.org/rflechner/scrapysharp
Goose Parser
⭐
211
Universal scrapping tool, which allows you to extract data using multiple environments
Skrape.it
⭐
209
A Kotlin-based testing/scraping/parsing library providing the ability to analyze and extract data from HTML (server & client-side rendered). It places particular emphasis on ease of use and a high level of readability by providing an intuitive DSL. It aims to be a testing lib, but can also be used to scrape websites in a convenient fashion.
Tianyancha
⭐
206
pip安装的天眼查爬虫API,指定的单个/多个企业工商信息一键保存为Excel/JSON格式。A Battery-included Scraper API of Tianyancha, the best Chinese business data and investigation platform.
Jsonframe Cheerio
⭐
195
simple multi-level scraper json input/output for Cheerio
Media Scraper
⭐
192
Scrapes all photos and videos in a web page / Instagram / Twitter / Tumblr / Reddit / pixiv / TikTok
Thepiratebay
⭐
185
💀 The Pirate Bay node.js client
Unfurl
⭐
185
Scraper for oEmbed, Twitter Cards and Open Graph metadata - fast and Promise-based ⚡️
Anime Dl
⭐
183
Anime-dl is a command-line program to download anime from CrunchyRoll and Funimation.
Docsearch Scraper
⭐
180
DocSearch - Scraper
Goribot
⭐
180
[Crawler/Scraper for Golang]🕷A lightweight distributed friendly Golang crawler framework.一个轻量的分布式友好的 Golang 爬虫框架。
Instagram Crawler
⭐
176
Crawl instagram photos, posts and videos for download.
Unhtml.rs
⭐
176
A magic html parser
Gmdb
⭐
176
GMDB is the ultra-simple, cross-platform Movie Library with Features (Search, Take Note, Watch Later, Like, Import, Learn, Instantly Torrent Magnet Watch)
Readablewebproxy
⭐
172
Rewriting web proxy and archival tool. At this point, it just tries to download all the things.
Novel
⭐
172
基于 Laravel 5.2 的小说网站
Scrape Twitter
⭐
165
🐦 Access Twitter data without an API key. [DEPRECATED]
Scrapelib
⭐
162
⛏ a library for scraping things
Datmusic Api
⭐
159
Alternative for VK Audio API
Demeter
⭐
154
Demeter is a tool for scraping the calibre web ui
Serpscrap
⭐
152
SEO python scraper to extract data from major searchengine result pages. Extract data like url, title, snippet, richsnippet and the type from searchresults for given keywords. Detect Ads or make automated screenshots. You can also fetch text content of urls provided in searchresults or by your own. It's usefull for SEO and business related research tasks.
1-100 of 178 projects
Next >
Advertising
📦 10
All Projects
Application Programming Interfaces
📦 124
Applications
📦 192
Artificial Intelligence
📦 78
Blockchain
📦 73
Build Tools
📦 113
Cloud Computing
📦 80
Code Quality
📦 28
Collaboration
📦 32
Command Line Interface
📦 49
Community
📦 83
Companies
📦 60
Compilers
📦 63
Computer Science
📦 80
Configuration Management
📦 42
Content Management
📦 175
Control Flow
📦 213
Data Formats
📦 78
Data Processing
📦 276
Data Storage
📦 135
Economics
📦 64
Frameworks
📦 215
Games
📦 129
Graphics
📦 110
Hardware
📦 152
Integrated Development Environments
📦 49
Learning Resources
📦 166
Legal
📦 29
Libraries
📦 129
Lists Of Projects
📦 22
Machine Learning
📦 347
Mapping
📦 64
Marketing
📦 15
Mathematics
📦 55
Media
📦 239
Messaging
📦 98
Networking
📦 315
Operating Systems
📦 89
Operations
📦 121
Package Managers
📦 55
Programming Languages
📦 245
Runtime Environments
📦 100
Science
📦 42
Security
📦 396
Social Media
📦 27
Software Architecture
📦 72
Software Development
📦 72
Software Performance
📦 58
Software Quality
📦 133
Text Editors
📦 49
Text Processing
📦 136
User Interface
📦 330
User Interface Components
📦 514
Version Control
📦 30
Virtualization
📦 71
Web Browsers
📦 42
Web Servers
📦 26
Web User Interface
📦 210