Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for scraper crawler
crawler
x
scraper
x
262 search results found
Scrapy
⭐
49,918
Scrapy, a fast high-level web crawling & scraping framework for Python.
Easyspider
⭐
36,416
A visual no-code/code-free web crawler/spider易采集:一个可视化浏览器自动化测试/数据采集/爬虫软件,可以无代码图形化
Lux
⭐
29,190
👾 Fast and simple video download library and CLI tool written in Go
Colly
⭐
23,382
Elegant Scraper and Crawler Framework for Golang
Crawlee
⭐
12,871
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.
Webmagic
⭐
11,080
A scalable web crawler framework for Java.
Avbook
⭐
8,777
AV 电影管理系统, avmoo , javbus , javlibrary 爬虫,线上 AV 影片图书馆,AV 磁力链接数据库,Japanese Adult Video Library,Adult Video Magnet Links - Japanese Adult Video Database
Ferret
⭐
5,540
Declarative web scraping
Autoscraper
⭐
5,159
A Smart, Automatic, Fast and Lightweight Web Scraper for Python
Douyin_tiktok_download_api
⭐
4,844
🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音、快手、T
Rod
⭐
4,505
A Devtools driver for web automation and scraping
Mygptreader
⭐
4,267
A community-driven way to read and chat with AI bots - powered by chatGPT.
Querylist
⭐
2,598
🕷️ The progressive PHP crawler framework! 优雅的渐进式PHP采集框架。
Googlescraper
⭐
2,540
A Python module to scrape several search engines (like Google, Yandex, Bing, Duckduckgo, ...). Including asynchronous networking support.
Trafilatura
⭐
2,447
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
Grab
⭐
2,292
Web Scraping Framework
Weibo_terminater
⭐
2,265
Final Weibo Crawler Scrap Anything From Weibo, comments, weibo contents, followers, anything. The Terminator
Awesome Puppeteer
⭐
2,245
A curated list of awesome puppeteer resources.
Google Play Scraper
⭐
2,108
Node.js scraper to get data from Google Play
Geziyor
⭐
1,892
Geziyor, blazing fast web crawling & scraping framework for Go. Supports JS rendering.
Scrapely
⭐
1,668
A pure-python HTML screen-scraping library
Jd Autobuy
⭐
1,309
Python爬虫,京东自动登录,在线抢购商品
Article Extractor
⭐
1,297
To extract main article from given URL with Node.js
Wombat
⭐
1,297
Lightweight Ruby web crawler/scraper with an elegant DSL which extracts structured data from pages.
Cariddi
⭐
1,228
Take a list of domains, crawl urls and scan for endpoints, secrets, api keys, file extensions, tokens and more
Scrapy Cluster
⭐
1,137
This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.
Parliament Scraper
⭐
1,049
Public Data Scraper for Parliament Data for the EU and other Parliaments
Crawler User Agents
⭐
1,045
Syntactic patterns of HTTP user-agents used by bots / robots / crawlers / scrapers / spiders. pull-request welcome ⭐
Mlscraper
⭐
935
🤖 Scrape data from HTML websites automatically by just providing examples
Instagram Crawler
⭐
922
Get Instagram posts/profile/hashtag data without using Instagram API
Awesome Datahoarding
⭐
892
List of data-hoarding related tools
Kimuraframework
⭐
874
Kimurai is a modern web scraping framework written in Ruby which works out of box with Headless Chromium/Firefox, PhantomJS, or simple HTTP requests and allows to scrape and interact with JavaScript rendered websites
Scrapyrt
⭐
793
HTTP API for Scrapy spiders
Crawly
⭐
790
Crawly, a high-level web crawling & scraping framework for Elixir.
Spidr
⭐
775
A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.
Till
⭐
770
DataHen Till is a companion tool to your existing web scraper that instantly makes it scalable, maintainable, and more unblockable, with minimal code changes on your scraper. Integrates with any scraper in 5 minutes.
Lulu
⭐
752
[Unmaintained] A simple and clean video/music/image downloader 👾
Skrape.it
⭐
714
A Kotlin-based testing/scraping/parsing library providing the ability to analyze and extract data from HTML (server & client-side rendered). It places particular emphasis on ease of use and a high level of readability by providing an intuitive DSL. It aims to be a testing lib, but can also be used to scrape websites in a convenient fashion.
Bookcorpus
⭐
698
Crawl BookCorpus
Google Play Scraper
⭐
645
Google play scraper for Python inspired by <facundoolano/google-play-scraper>
Easy Scraping Tutorial
⭐
618
Simple but useful Python web scraping tutorial code.
Scrapedin
⭐
589
LinkedIn Scraper (currently working 2020)
Newcrawler
⭐
583
Free Web Scraping Tool with Java
Nintendo Switch Eshop
⭐
513
Crawler for Nintendo Switch eShop
Nodejs Stuff
⭐
484
Node.js libs I want to keep in mind.
Webster
⭐
465
a reliable high-level web crawling & scraping framework for Node.js.
Pywebcopy
⭐
455
Locally saves webpages to your hard disk with images, css, js & links as is.
Scrapple
⭐
452
A framework for creating semi-automatic web content extractors
Awesome Scrapy
⭐
450
A curated list of awesome packages, articles, and other cool resources from the Scrapy community.
Mdcx
⭐
435
Movie metadata scraper
Spider
⭐
426
The fastest web crawler written in Rust. Maintained by @a11ywatch.
Fbcrawl
⭐
415
A Facebook crawler
Pulsarrpa
⭐
413
Automate webpages at scale, scrape web data completely and accurately with high performance, distributed RPA.
Linkedin Profile Scraper Api
⭐
404
🕵️♂️ LinkedIn profile scraper returning structured profile data in JSON.
Dude
⭐
397
dude uncomplicated data extraction: A simple framework for writing web scrapers using Python decorators
Search Engines Scraper
⭐
377
Search google, bing, yahoo, and other search engines with python
Scrapy Zyte Smartproxy
⭐
363
Zyte Smart Proxy Manager (formerly Crawlera) middleware for Scrapy
Moodle Dl
⭐
355
Moodle-DL downloads course content fast from Moodle (eg. lecture pdfs)
Hquery.php
⭐
345
An extremely fast web scraper that parses megabytes of invalid HTML in a blink of an eye. PHP5.3+, no dependencies.
Linkedindumper
⭐
337
Python 3 script to dump/scrape/extract company employees from LinkedIn API
Stweet
⭐
308
Advanced python library to scrap Twitter (tweets, users) from unofficial API
Memorious
⭐
302
Lightweight web scraping toolkit for documents and structured data.
Crawler
⭐
285
Library for Rapid (Web) Crawler and Scraper Development
Gopa
⭐
281
[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn
Web Scraping
⭐
281
Más de 50 ejemplos de web scraping utilizando: Requests | Scrapy | Selenium | LXML | BeautifulSoup
Harvestman
⭐
278
Quick and dirty web crawling.
Youtube Projects
⭐
272
This repository contains all the code I use in my YouTube tutorials.
Ant
⭐
271
A web crawler for Go
Python Automation Scripts
⭐
264
Simple yet powerful automation stuffs.
Sasila
⭐
264
一个灵活、友好的爬虫框架
Ruiji.net
⭐
261
crawler framework, distributed crawler extractor
Weibo_terminator_workflow
⭐
259
Update Version of weibo_terminator, This is Workflow Version aim at Get Job Done!
Arachnid
⭐
246
Crawl all unique internal links found on a given website, and extract SEO related information - supports javascript based sites
Rcrawler
⭐
240
An R web crawler and scraper
Pythonscraping
⭐
240
The code of book: Python Scraping
Lightnovel_epub
⭐
233
🍭 epub generator for (light)novels (轻)小说 epub 生成器,支持站点:轻之国度、轻小说文库
Nudecrawler
⭐
231
Crawl telegra.ph searching for nudes!
Goose Parser
⭐
222
Universal scraping tool, which allows you to extract data using multiple environments
Wayback Machine Scraper
⭐
219
A command-line utility and Scrapy middleware for scraping time series data from Archive.org's Wayback Machine.
Zimit
⭐
209
Make a ZIM file from any Web site and surf offline!
Spidey
⭐
179
A loose framework for crawling and scraping web sites.
Antch
⭐
177
Antch, a fast, powerful and extensible web crawling & scraping framework for Go
Crawley
⭐
167
Pythonic Crawling / Scraping Framework based on Non Blocking I/O operations.
Findpapers
⭐
164
Findpapers: A tool for helping researchers who are looking for related works
Goribot
⭐
162
[Crawler/Scraper for Golang]🕷A lightweight distributed friendly Golang crawler framework.一个轻量的分布式友好的 Golang 爬虫框架。
Instagram Crawler
⭐
157
Crawl instagram photos, posts and videos for download.
Spidy
⭐
154
Domain names collector - Crawl websites and collect domain names along with their availability status.
Site Audit Seo
⭐
151
Web service and CLI tool for SEO site audit: crawl site, lighthouse all pages, view public reports in browser. Also output to console, json, csv, xlsx, Google Drive.
Google News Scraper
⭐
144
Lightweight scraper for Google News
Estela
⭐
142
estela, an elastic web scraping cluster 🕸
Onegram
⭐
136
This repository is no longer maintained.
Scrape
⭐
135
a command-line web scraping tool
Grawler
⭐
128
Grawler is a tool written in PHP which comes with a web interface that automates the task of using google dorks, scrapes the results, and stores them in a file.
Double Agent
⭐
120
A test suite of common scraper detection techniques. See how detectable your scraper stack is.
Evine
⭐
117
Interactive CLI Web Crawler
Scraply
⭐
114
Scraply a simple dom scraper to fetch information from any html based website
Od Database
⭐
113
Distributed crawler, database and web frontend for public directories indexing
Gflare Tk
⭐
110
Open-Source Python Based SEO Web Crawler
Zyte Smartproxy Headless Proxy
⭐
106
A complimentary proxy to help to use SPM with headless browsers
Node Web Crawler
⭐
104
A web scraper with a web user interface which shows scraping stats in realtime. Uses Node.JS, jQuery, socket.io and Express.
Related Searches
Python Crawler (4,545)
Python Scraper (3,513)
Javascript Scraper (2,047)
Scraper Scrape (1,534)
Scraper Web Crawler (1,528)
Javascript Crawler (1,142)
Crawler Spider (1,044)
Crawler Scrapy (1,002)
Java Crawler (806)
Html Scraper (757)
1-100 of 262 search results
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2025 Awesome Open Source. All rights reserved.