Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for web crawler webscraper
web-crawler
x
webscraper
x
1 search results found
Crawlee
⭐
12,106
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.
Crawlab
⭐
10,521
Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架
Spider Flow
⭐
8,075
新一代爬虫平台,以图形化方式定义爬虫流程,不写代码即可完成爬虫。
Awesome Crawler
⭐
5,859
A collection of awesome web crawler,spider in different languages
Autoscraper
⭐
5,159
A Smart, Automatic, Fast and Lightweight Web Scraper for Python
Hakrawler
⭐
4,120
Simple, fast web crawler designed for easy, quick discovery of endpoints and assets within a web application
Node Osmosis
⭐
4,083
Web scraper for NodeJS
Php Curl Class
⭐
3,208
PHP Curl Class makes it easy to send HTTP requests and integrate with web APIs
Webcollector
⭐
2,974
WebCollector is an open source web crawler framework based on Java.It provides some simple interfaces for crawling the Web,you can setup a multi-threaded web crawler in less than 5 minutes.
Nutch
⭐
2,742
Apache Nutch is an extensible and scalable web crawler
Gecco
⭐
2,403
Easy to use lightweight web crawler(易用的轻量化网络爬虫)
Soup
⭐
2,074
Web Scraper in Go, similar to BeautifulSoup
Abot
⭐
1,991
Cross Platform C# web crawler framework built for speed and flexibility. Please star this project! +1.
Pspider
⭐
1,675
简单易用的Python爬虫框架,QQ交流群:597510560
Tomorrow
⭐
1,463
Magic decorator syntax for asynchronous code in Python
100projectsofcode
⭐
1,293
A list of practical knowledge-building projects.
Lightnovel Crawler
⭐
1,185
Generate and download e-books from online sources.
Faster Than Requests
⭐
1,061
Faster requests on Python 3
Stealth
⭐
923
🚀 Stealth - Secure, Peer-to-Peer, Private and Automateable Web Browser/Scraper/Proxy
Storm Crawler
⭐
834
A scalable, mature and versatile web crawler based on Apache Storm
Spidr
⭐
775
A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.
Fetchbot
⭐
758
A simple and flexible web crawler that follows the robots.txt policies and crawl delays.
Marginaliasearch
⭐
711
Internet search engine for text-oriented websites. Indexing the small, old and weird web.
Xidel
⭐
611
Command line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern matching. It can also create new or transformed XML/HTML/JSON documents.
Scrapers
⭐
511
A list of scrapers from around the web.
Phpscraper
⭐
486
A universal web-util for PHP.
Browsertrix Crawler
⭐
470
Run a high-fidelity browser-based crawler in a single Docker container
Scrapple
⭐
452
A framework for creating semi-automatic web content extractors
Wereadscan
⭐
447
扫描“微信读书”已购图书并下载本地PDF的爬虫
Ache
⭐
433
ACHE is a web crawler for domain-specific search.
Monkey Dl
⭐
428
Bulk download your favourite anime episodes from your favourite anime websites
Pulsarrpa
⭐
413
Automate webpages at scale, scrape web data completely and accurately with high performance, distributed RPA.
Dcrawl
⭐
411
Simple, but smart, multi-threaded web crawler for randomly gathering huge lists of unique domain names.
Sparkler
⭐
401
Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
Kochat
⭐
383
Opensource Korean chatbot framework
Basketball_reference_web_scraper
⭐
382
NBA Stats API via Basketball Reference
Proxy_requests
⭐
381
a class that uses scraped proxies to make http GET/POST requests (Python requests)
Scrape Linkedin Selenium
⭐
353
`scrape_linkedin` is a python package that allows you to scrape personal LinkedIn profiles & company pages - turning the data into structured json.
Hquery.php
⭐
345
An extremely fast web scraper that parses megabytes of invalid HTML in a blink of an eye. PHP5.3+, no dependencies.
Google Maps Scraper
⭐
330
scrape data data from Google Maps. Extracts data such as the name, address, phone number, website URL, rating, reviews number, latitude and longitude, reviews,email and more for each place
Archivebot
⭐
328
ArchiveBot, an IRC bot for archiving websites
Supercrawler
⭐
324
A web crawler. Supercrawler automatically crawls websites. Define custom handlers to parse content. Obeys robots.txt, rate limits and concurrency limits.
Social Media Profile Scrapers
⭐
322
Fetch user's data across social media
Spidy
⭐
287
The simple, easy to use command line web crawler.
Crawler
⭐
285
Library for Rapid (Web) Crawler and Scraper Development
Gopa
⭐
281
[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn
Web Scraping
⭐
276
Detailed web scraping tutorials for dummies with financial data crawlers on Reddit WallStreetBets, CME (both options and futures), US Treasury, CFTC, LME, MacroTrends, SHFE and alternative data crawlers on Tomtom, BBC, Wall Street Journal, Al Jazeera, Reuters, Financial Times, Bloomberg, CNN, Fortune, The Economist
Ant
⭐
271
A web crawler for Go
Technicalconceptsforinterviews
⭐
265
Various technical concepts for interviews - Feel free to contribute and make it better!
Lagoujob
⭐
250
Job data mining repo for lagou.com
Football Data Collection
⭐
246
Web Scraper used to create Kaggle European Soccer database
Rcrawler
⭐
240
An R web crawler and scraper
Summarizer
⭐
236
A Reddit bot that summarizes news articles written in Spanish or English. It uses a custom built algorithm to rank words and sentences.
Getsy
⭐
234
A simple browser/client-side web scraper.
News Crawl
⭐
229
News crawling with StormCrawler - stores content as WARC
Infinitycrawler
⭐
221
A simple but powerful web crawler library for .NET
Amazon Scraper
⭐
219
A simple web scraper to extract Product Data and Pricing from Amazon
Crawler Commons
⭐
217
A set of reusable Java components that implement functionality common to any web crawler
Selenops
⭐
215
A Swift Web Crawler 🕷
Awesome Web Scraper
⭐
214
A collection of awesome web scaper, crawler.
Crawley
⭐
208
The unix-way web crawler
Strong Web Crawler
⭐
204
基于C#.NET+PhantomJS+Sellenium的高级网络爬虫程序。可执行Javascrip
Portia Dashboard
⭐
190
portia-dashboard is a visual web crawler based on scrapinghub/portia
Ignareo Isml Auto Voter
⭐
186
Ignareo the Carillon, a web crawler/spider template of ultimate high concurrency built for leprechauns. Carillons as the best web spiders; Long live the golden years of leprechauns! (ISML=international saimoe; 2022 ISML is last ISML)
Crawlab Lite
⭐
184
Lite version of Crawlab. 轻量版 Crawlab 爬虫管理平台
Daath Ai Parser
⭐
184
Daath AI Parser is an open-source application that uses OpenAI to parse visible text of HTML elements.
Digger
⭐
180
Digger is a powerful and flexible web crawler implemented by pure golang
Zhihu Crawler People
⭐
179
A simple distributed crawler for zhihu && data analysis
Antch
⭐
177
Antch, a fast, powerful and extensible web crawling & scraping framework for Go
Musicer
⭐
176
旨在将网易云、酷狗、QQ、酷我等各音乐平台集于一体
Goscrape
⭐
172
Web scraper that can create an offline readable version of a website
Crawler_shopee_public
⭐
169
蝦皮非同步爬蟲 + 競品賣家分析
Collector Http
⭐
162
Norconex Web Crawler (or spider) is a flexible web crawler for collecting, parsing, and manipulating data from the Internet (or Intranet) to various data repositories such as search engines.
Screenslicer
⭐
159
Automatic, zero-config web scraping -- written in Java, has no dependency on Java EE or app servers, and the web scraper has a restful/JSON API. Currently unmaintained.
Cocrawler
⭐
159
CoCrawler is a versatile web crawler built using modern tools and concurrency.
Facebook_page_scraper
⭐
150
Scrapes facebook's pages front end with no limitations & provides a feature to turn data into structured JSON or CSV
Google News Scraper
⭐
144
Lightweight scraper for Google News
Not Your Average Web Crawler
⭐
130
A web crawler (for bug hunting) that gathers more than you can imagine.
Cascadia
⭐
128
Go cascadia package command line CSS selector
Ospider
⭐
124
开源矢量地理数据获取与预处理工具(POI/AOI/行政区/路网/土地利用)
Proxy
⭐
123
A simple tool for fetching usable proxies from several websites.
Php Crawler
⭐
121
A php crawler that finds emails on the internets
Dyer
⭐
118
Dyer is designed for reliable, flexible and fast web crawling, providing some high-level, comprehensive features without compromising speed.
Evine
⭐
117
Interactive CLI Web Crawler
Geeksforgeeksscrapper
⭐
116
Scrapes g4g and creates PDF
Raspagem De Dados Para Iniciantes
⭐
115
Raspagem de dados para iniciante usando Scrapy e outras libs básicas
Html Metadata
⭐
115
MetaData html scraper and parser for Node.js (supports Promises and callback style)
Crawlbox
⭐
112
Easy way to brute-force web directory.
Gflare Tk
⭐
110
Open-Source Python Based SEO Web Crawler
Abotx
⭐
106
Cross Platform C# Web crawler framework, headless browser, parallel crawler. Please star this project! +1.
Seleniumcrawler
⭐
105
An example using Selenium webdrivers for python and Scrapy framework to create a web scraper to crawl an ASP site
Node Web Crawler
⭐
104
A web scraper with a web user interface which shows scraping stats in realtime. Uses Node.JS, jQuery, socket.io and Express.
Haikei
⭐
102
HaiKei is an anime streaming website that uses the consumet API
Zhihu_crawler
⭐
100
a crawler for zhihu
Facebook Marketplace Scraper
⭐
99
This repository contains a script to scrape Facebook Marketplace data using Playwright, BeautifulSoup and Streamlit.
Crawl Anywhere
⭐
98
Crawl-Anywhere - Web Crawler and document processing pipeline with Solr integration.
Cowin Vaccine Notifier
⭐
97
Automated Python Script to retrieve vaccine slots availability and get notified when a slot is available.
Get Sauce
⭐
97
A command line program to download Hentai videos and images from multiple websites
Senpwai
⭐
97
A desktop app for tracking and batch downloading anime
Krawler
⭐
96
A web crawling framework written in Kotlin
Related Searches
Python Web Crawler (2,394)
Scraper Web Crawler (1,355)
Python Webscraper (1,022)
Scraper Webscraper (643)
1-1 of 1 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.