Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for crawler web crawler
crawler
x
web-crawler
x
214 search results found
Scrapy
⭐
49,918
Scrapy, a fast high-level web crawling & scraping framework for Python.
Crawlee
⭐
12,101
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.
Crawlab
⭐
10,521
Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架
Spider Flow
⭐
8,075
新一代爬虫平台,以图形化方式定义爬虫流程,不写代码即可完成爬虫。
Katana
⭐
7,995
A next-generation crawling and spidering framework.
Awesome Web Scraping
⭐
6,060
List of libraries, tools and APIs for web scraping and data processing.
Awesome Crawler
⭐
5,859
A collection of awesome web crawler,spider in different languages
Autoscraper
⭐
5,159
A Smart, Automatic, Fast and Lightweight Web Scraper for Python
Douyin_tiktok_download_api
⭐
4,844
🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音、快手、T
Rod
⭐
4,505
A Devtools driver for web automation and scraping
Hakrawler
⭐
4,120
Simple, fast web crawler designed for easy, quick discovery of endpoints and assets within a web application
Browser Fingerprinting
⭐
3,353
Analysis of Bot Protection systems with available countermeasures 🚿. How to defeat anti-bot system 👻 and get around browser fingerprinting scripts 🕵️♂️ when scraping the web?
Webcollector
⭐
2,974
WebCollector is an open source web crawler framework based on Java.It provides some simple interfaces for crawling the Web,you can setup a multi-threaded web crawler in less than 5 minutes.
Nutch
⭐
2,742
Apache Nutch is an extensible and scalable web crawler
Trafilatura
⭐
2,447
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
Gecco
⭐
2,403
Easy to use lightweight web crawler(易用的轻量化网络爬虫)
Grab
⭐
2,292
Web Scraping Framework
Gospider
⭐
2,190
Gospider - Fast web spider written in Go
Abot
⭐
1,991
Cross Platform C# web crawler framework built for speed and flexibility. Please star this project! +1.
Pspider
⭐
1,675
简单易用的Python爬虫框架,QQ交流群:597510560
Core
⭐
1,290
The complete web scraping toolkit for PHP.
Lightnovel Crawler
⭐
1,185
Generate and download e-books from online sources.
Storm Crawler
⭐
834
A scalable, mature and versatile web crawler based on Apache Storm
Spidr
⭐
775
A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.
Till
⭐
770
DataHen Till is a companion tool to your existing web scraper that instantly makes it scalable, maintainable, and more unblockable, with minimal code changes on your scraper. Integrates with any scraper in 5 minutes.
Fetchbot
⭐
758
A simple and flexible web crawler that follows the robots.txt policies and crawl delays.
Browsertrix Crawler
⭐
470
Run a high-fidelity browser-based crawler in a single Docker container
Scrapple
⭐
452
A framework for creating semi-automatic web content extractors
Spidersuite
⭐
447
Advance web spider/crawler for cyber security professionals
Pulsarrpa
⭐
413
Automate webpages at scale, scrape web data completely and accurately with high performance, distributed RPA.
Dcrawl
⭐
411
Simple, but smart, multi-threaded web crawler for randomly gathering huge lists of unique domain names.
Sparkler
⭐
401
Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
Dude
⭐
397
dude uncomplicated data extraction: A simple framework for writing web scrapers using Python decorators
Hquery.php
⭐
345
An extremely fast web scraper that parses megabytes of invalid HTML in a blink of an eye. PHP5.3+, no dependencies.
Archivebot
⭐
328
ArchiveBot, an IRC bot for archiving websites
Supercrawler
⭐
324
A web crawler. Supercrawler automatically crawls websites. Define custom handlers to parse content. Obeys robots.txt, rate limits and concurrency limits.
Polite
⭐
310
Be nice on the web
Spidy
⭐
287
The simple, easy to use command line web crawler.
Crawler
⭐
285
Library for Rapid (Web) Crawler and Scraper Development
Web Scraping
⭐
281
Más de 50 ejemplos de web scraping utilizando: Requests | Scrapy | Selenium | LXML | BeautifulSoup
Gopa
⭐
281
[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn
Youtube Projects
⭐
272
This repository contains all the code I use in my YouTube tutorials.
Ant
⭐
271
A web crawler for Go
Lagoujob
⭐
250
Job data mining repo for lagou.com
Football Data Collection
⭐
246
Web Scraper used to create Kaggle European Soccer database
Rcrawler
⭐
240
An R web crawler and scraper
Laravel
⭐
238
Laravel adapter for Roach, the complete web scraping toolkit for PHP.
Docbao
⭐
233
Công cụ quét và phân tích từ khoá các trang báo mạng Việt Nam
Nudecrawler
⭐
231
Crawl telegra.ph searching for nudes!
News Crawl
⭐
229
News crawling with StormCrawler - stores content as WARC
Infinitycrawler
⭐
221
A simple but powerful web crawler library for .NET
Wayback Machine Scraper
⭐
219
A command-line utility and Scrapy middleware for scraping time series data from Archive.org's Wayback Machine.
Crawler Commons
⭐
217
A set of reusable Java components that implement functionality common to any web crawler
Selenops
⭐
215
A Swift Web Crawler 🕷
Zimit
⭐
209
Make a ZIM file from any Web site and surf offline!
Crawley
⭐
208
The unix-way web crawler
Strong Web Crawler
⭐
204
基于C#.NET+PhantomJS+Sellenium的高级网络爬虫程序。可执行Javascrip
Portia Dashboard
⭐
190
portia-dashboard is a visual web crawler based on scrapinghub/portia
Crawlab Lite
⭐
184
Lite version of Crawlab. 轻量版 Crawlab 爬虫管理平台
Digger
⭐
180
Digger is a powerful and flexible web crawler implemented by pure golang
Zhihu Crawler People
⭐
179
A simple distributed crawler for zhihu && data analysis
Antch
⭐
177
Antch, a fast, powerful and extensible web crawling & scraping framework for Go
Crawler_shopee_public
⭐
169
蝦皮非同步爬蟲 + 競品賣家分析
Collector Http
⭐
162
Norconex Web Crawler (or spider) is a flexible web crawler for collecting, parsing, and manipulating data from the Internet (or Intranet) to various data repositories such as search engines.
Cocrawler
⭐
159
CoCrawler is a versatile web crawler built using modern tools and concurrency.
Ir
⭐
155
Projeto de calculo de Imposto de Renda em operacoes na bovespa automaticamente. Tags:canal eletronico do investidor, CEI, selenium, bovespa, IRPF, IR, imposto de renda, finance, yahoo finance, acao, fii, etf, python, crawler, webscraping, calculadora ir
Juno_crawler
⭐
147
Scrapy crawler to collect data on the back catalog of songs listed for sale.
Google News Scraper
⭐
144
Lightweight scraper for Google News
Estela
⭐
142
estela, an elastic web scraping cluster 🕸
Scrapy Training
⭐
141
Scrapy Training companion code
Not Your Average Web Crawler
⭐
130
A web crawler (for bug hunting) that gathers more than you can imagine.
Proxy
⭐
123
A simple tool for fetching usable proxies from several websites.
Php Crawler
⭐
121
A php crawler that finds emails on the internets
Dyer
⭐
118
Dyer is designed for reliable, flexible and fast web crawling, providing some high-level, comprehensive features without compromising speed.
Evine
⭐
117
Interactive CLI Web Crawler
Crawlbox
⭐
112
Easy way to brute-force web directory.
Gflare Tk
⭐
110
Open-Source Python Based SEO Web Crawler
Abotx
⭐
106
Cross Platform C# Web crawler framework, headless browser, parallel crawler. Please star this project! +1.
Seleniumcrawler
⭐
105
An example using Selenium webdrivers for python and Scrapy framework to create a web scraper to crawl an ASP site
Node Web Crawler
⭐
104
A web scraper with a web user interface which shows scraping stats in realtime. Uses Node.JS, jQuery, socket.io and Express.
Zhihu_crawler
⭐
100
a crawler for zhihu
Crawl Anywhere
⭐
98
Crawl-Anywhere - Web Crawler and document processing pipeline with Solr integration.
Krawler
⭐
96
A web crawling framework written in Kotlin
Polipus
⭐
95
Polipus: distributed and scalable web-crawler framework
Terpene Profile Parser For Cannabis Strains
⭐
93
Parser and database to index the terpene profile of different strains of Cannabis from online databases
Actor Scraper
⭐
93
House of Apify Scrapers. Generic scraping actors with a simple UI to handle complex web crawling and scraping use cases.
Scrapyd Cluster On Heroku
⭐
90
Set up free and scalable Scrapyd cluster for distributed web-crawling with just a few clicks. DEMO 👉
Tacocat
⭐
86
A platform displaying the latest software engineer job information to entry-level new graduates
Webcrawler
⭐
86
Web crawler to download pictures from zhihu.com
Crabler
⭐
85
Web Crawler for Crabs
Bathyscaphe
⭐
83
Fast, highly configurable, cloud native dark web crawler.
Scrapper
⭐
83
Web scraper with a simple REST API living in Docker and using a Headless browser and Readability.js for parsing.
Arachnid
⭐
80
Powerful web scraping framework for Crystal
Node Search Engine
⭐
79
Sample search engine with web crawler, built on Node.js + CouchDB + Limestone
Webscrapper
⭐
77
Simple and powerfull all in one Telegram Bot to scrap webpages using Requests, html5lib and Beautifulsoup
Goodreadsscraper
⭐
76
Scrape data from Goodreads using Scrapy and Selenium 📚
Scrapfly Scrapers
⭐
76
Web scrapers for popular targets powered Scrapfly.io
Scraping Ebay
⭐
73
Scraping Ebay's products using Scrapy Web Crawling Framework
Spotifyscraper
⭐
72
Spotify Scraper to extract all the information from spotify, download mp3 with cover of the song
Davedavefind
⭐
71
A simple search engine based on the web crawler developed in Udacity's CS101 course.
Related Searches
Python Crawler (4,545)
Scraper Web Crawler (1,388)
Javascript Crawler (1,142)
Crawler Spider (1,073)
Crawler Scrapy (988)
Scraper Crawler (896)
Java Crawler (807)
1-100 of 214 search results
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.