Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for crawler webscraper
crawler
x
webscraper
x
12 search results found
Crawlee
⭐
12,106
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.
Crawlab
⭐
10,521
Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架
Spider Flow
⭐
8,075
新一代爬虫平台,以图形化方式定义爬虫流程,不写代码即可完成爬虫。
Awesome Crawler
⭐
5,859
A collection of awesome web crawler,spider in different languages
Autoscraper
⭐
5,159
A Smart, Automatic, Fast and Lightweight Web Scraper for Python
Hakrawler
⭐
4,120
Simple, fast web crawler designed for easy, quick discovery of endpoints and assets within a web application
Webcollector
⭐
2,974
WebCollector is an open source web crawler framework based on Java.It provides some simple interfaces for crawling the Web,you can setup a multi-threaded web crawler in less than 5 minutes.
Nutch
⭐
2,742
Apache Nutch is an extensible and scalable web crawler
Gecco
⭐
2,403
Easy to use lightweight web crawler(易用的轻量化网络爬虫)
Abot
⭐
1,991
Cross Platform C# web crawler framework built for speed and flexibility. Please star this project! +1.
Pspider
⭐
1,675
简单易用的Python爬虫框架,QQ交流群:597510560
Lightnovel Crawler
⭐
1,185
Generate and download e-books from online sources.
Storm Crawler
⭐
834
A scalable, mature and versatile web crawler based on Apache Storm
Spidr
⭐
775
A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.
Fetchbot
⭐
758
A simple and flexible web crawler that follows the robots.txt policies and crawl delays.
Browsertrix Crawler
⭐
470
Run a high-fidelity browser-based crawler in a single Docker container
Scrapple
⭐
452
A framework for creating semi-automatic web content extractors
Pulsarrpa
⭐
413
Automate webpages at scale, scrape web data completely and accurately with high performance, distributed RPA.
Dcrawl
⭐
411
Simple, but smart, multi-threaded web crawler for randomly gathering huge lists of unique domain names.
Sparkler
⭐
401
Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
Hquery.php
⭐
345
An extremely fast web scraper that parses megabytes of invalid HTML in a blink of an eye. PHP5.3+, no dependencies.
Archivebot
⭐
328
ArchiveBot, an IRC bot for archiving websites
Supercrawler
⭐
324
A web crawler. Supercrawler automatically crawls websites. Define custom handlers to parse content. Obeys robots.txt, rate limits and concurrency limits.
Spidy
⭐
287
The simple, easy to use command line web crawler.
Crawler
⭐
285
Library for Rapid (Web) Crawler and Scraper Development
Gopa
⭐
281
[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn
Ant
⭐
271
A web crawler for Go
Lagoujob
⭐
250
Job data mining repo for lagou.com
Football Data Collection
⭐
246
Web Scraper used to create Kaggle European Soccer database
Rcrawler
⭐
240
An R web crawler and scraper
News Crawl
⭐
229
News crawling with StormCrawler - stores content as WARC
Infinitycrawler
⭐
221
A simple but powerful web crawler library for .NET
Crawler Commons
⭐
217
A set of reusable Java components that implement functionality common to any web crawler
Selenops
⭐
215
A Swift Web Crawler 🕷
Crawley
⭐
208
The unix-way web crawler
Strong Web Crawler
⭐
204
基于C#.NET+PhantomJS+Sellenium的高级网络爬虫程序。可执行Javascrip
Portia Dashboard
⭐
190
portia-dashboard is a visual web crawler based on scrapinghub/portia
Crawlab Lite
⭐
184
Lite version of Crawlab. 轻量版 Crawlab 爬虫管理平台
Digger
⭐
180
Digger is a powerful and flexible web crawler implemented by pure golang
Zhihu Crawler People
⭐
179
A simple distributed crawler for zhihu && data analysis
Antch
⭐
177
Antch, a fast, powerful and extensible web crawling & scraping framework for Go
Crawler_shopee_public
⭐
169
蝦皮非同步爬蟲 + 競品賣家分析
Collector Http
⭐
162
Norconex Web Crawler (or spider) is a flexible web crawler for collecting, parsing, and manipulating data from the Internet (or Intranet) to various data repositories such as search engines.
Cocrawler
⭐
159
CoCrawler is a versatile web crawler built using modern tools and concurrency.
Google News Scraper
⭐
144
Lightweight scraper for Google News
Not Your Average Web Crawler
⭐
130
A web crawler (for bug hunting) that gathers more than you can imagine.
Proxy
⭐
123
A simple tool for fetching usable proxies from several websites.
Php Crawler
⭐
121
A php crawler that finds emails on the internets
Dyer
⭐
118
Dyer is designed for reliable, flexible and fast web crawling, providing some high-level, comprehensive features without compromising speed.
Evine
⭐
117
Interactive CLI Web Crawler
Crawlbox
⭐
112
Easy way to brute-force web directory.
Gflare Tk
⭐
110
Open-Source Python Based SEO Web Crawler
Abotx
⭐
106
Cross Platform C# Web crawler framework, headless browser, parallel crawler. Please star this project! +1.
Seleniumcrawler
⭐
105
An example using Selenium webdrivers for python and Scrapy framework to create a web scraper to crawl an ASP site
Node Web Crawler
⭐
104
A web scraper with a web user interface which shows scraping stats in realtime. Uses Node.JS, jQuery, socket.io and Express.
Zhihu_crawler
⭐
100
a crawler for zhihu
Crawl Anywhere
⭐
98
Crawl-Anywhere - Web Crawler and document processing pipeline with Solr integration.
Krawler
⭐
96
A web crawling framework written in Kotlin
Polipus
⭐
95
Polipus: distributed and scalable web-crawler framework
Terpene Profile Parser For Cannabis Strains
⭐
93
Parser and database to index the terpene profile of different strains of Cannabis from online databases
Tacocat
⭐
86
A platform displaying the latest software engineer job information to entry-level new graduates
Webcrawler
⭐
86
Web crawler to download pictures from zhihu.com
Crabler
⭐
85
Web Crawler for Crabs
Bathyscaphe
⭐
83
Fast, highly configurable, cloud native dark web crawler.
Arachnid
⭐
80
Powerful web scraping framework for Crystal
Node Search Engine
⭐
79
Sample search engine with web crawler, built on Node.js + CouchDB + Limestone
Goodreadsscraper
⭐
76
Scrape data from Goodreads using Scrapy and Selenium 📚
Spotifyscraper
⭐
72
Spotify Scraper to extract all the information from spotify, download mp3 with cover of the song
Davedavefind
⭐
71
A simple search engine based on the web crawler developed in Udacity's CS101 course.
Webkitcrawler
⭐
69
QtWebKit-based web crawler
Crawler
⭐
64
I needed a serious web crawler for search engine applications. This is it.
Dotnetcrawler
⭐
63
DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-w
Simplestorm
⭐
62
Simple Storm-like distributed application implementation
Gocrawler
⭐
60
A distributed web crawler implemented using Go, Postgres, RabbitMQ and Docker
Keyword_based_sina_weibo_crawler
⭐
59
A web crawler for Sina, search and retrieve microblogs that contain certain keywords 一个简单的python爬虫实践,爬取包含关键词的新浪微博
Siteshooter
⭐
58
📷 Automate full website screenshots and PDF generation with multiple viewport support.
Owlcrawler
⭐
54
Crawl the web using nats.io and Go
Pysearch
⭐
48
Web crawler and Search engine in Python.
Webcrawler
⭐
47
Just a simple web crawler which return crawled links as IObservable using reactive extension and async await.
Webcollector Python
⭐
47
WebCollector-Python is an open source web crawler framework based on Python.It provides some simple interfaces for crawling the Web,you can setup a multi-threaded web crawler in less than 5 minutes.
Wpcrawler
⭐
43
a web crawler for single WordPress site
Hawk
⭐
43
Blazingly fast web crawler for mapping and updating data
Ajax_crawler
⭐
41
A flexible web crawler based on Scrapy for fetching most of Ajax or other various types of web pages. Easy to use: To customize a new web crawler-You just need to write a config file and run.
Maman
⭐
40
Rust Web Crawler saving pages on Redis
Creepy
⭐
40
Dead simple web crawler for Python
Jason The Miner
⭐
40
⛏ A versatile Web scraper for Node.js
Scrapemate
⭐
39
Golang Crawling and scraping framework
Learncpp Download
⭐
39
An advanced web scraper tool that seamlessly fetches and combines over 200 online tutorials into a convenient offline PDF format.
X Ray Crawler
⭐
39
Friendly web crawler for x-ray
Validate Website
⭐
38
Web crawler for checking the validity of your documents.
Jiayuan
⭐
37
a web crawler and data analysis repo with Python3.5, R, Excel 2016 and TAGUL
Grell
⭐
36
Web crawler with a Ruby API
Flink Crawler
⭐
35
Continuous scalable web crawler built on top of Flink and crawler-commons
Python Marmiton
⭐
35
Python API to search & get recipes from the 'marmiton.com' website (web crawler, unofficial)
Spiderx
⭐
34
A simple web-crawler development framework based on .Net Core.
Cobweb Lnx
⭐
34
CobWeb is a Python library for web scraping. The library consists of two classes: Spider and Scraper.
Market Trend Prediction
⭐
32
This is a project of build knowledge graph course. The project leverages historical stock price, and integrates social media listening from customers to predict market Trend On Dow Jones Industrial Average (DJIA).
Crawler4j
⭐
30
Open Source Simple Web Crawler for Java. Simple Flexible And Lightweight
Stormscraper
⭐
29
A Storm based web crawler with Cassandra backend
Webcrawler
⭐
25
A web crawler based on requests-html, mainly targets for url validation test.
Related Searches
Python Crawler (4,545)
Web Crawler Webscraper (1,659)
Javascript Crawler (1,142)
Python Webscraper (1,022)
Crawler Scrapy (988)
Scraper Crawler (896)
Crawler Spider (709)
Scraper Webscraper (643)
Java Crawler (594)
1-12 of 12 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.