Awesome Open Source

Programming Languages

Search results for crawler web crawler

214 search results found

Scrapy ⭐ 49,918

Scrapy, a fast high-level web crawling & scraping framework for Python.

Crawlee ⭐ 12,101

Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.

Crawlab ⭐ 10,521

Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台，支持任何语言和框架

Spider Flow ⭐ 8,075

新一代爬虫平台，以图形化方式定义爬虫流程，不写代码即可完成爬虫。

Katana ⭐ 7,995

A next-generation crawling and spidering framework.

Awesome Web Scraping ⭐ 6,060

List of libraries, tools and APIs for web scraping and data processing.

Awesome Crawler ⭐ 5,859

A collection of awesome web crawler,spider in different languages

Autoscraper ⭐ 5,159

A Smart, Automatic, Fast and Lightweight Web Scraper for Python

Douyin_tiktok_download_api ⭐ 4,844

🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音、快手、T

A Devtools driver for web automation and scraping

Hakrawler ⭐ 4,120

Simple, fast web crawler designed for easy, quick discovery of endpoints and assets within a web application

Browser Fingerprinting ⭐ 3,353

Analysis of Bot Protection systems with available countermeasures 🚿. How to defeat anti-bot system 👻 and get around browser fingerprinting scripts 🕵️‍♂️ when scraping the web?

Webcollector ⭐ 2,974

WebCollector is an open source web crawler framework based on Java.It provides some simple interfaces for crawling the Web,you can setup a multi-threaded web crawler in less than 5 minutes.

Nutch ⭐ 2,742

Apache Nutch is an extensible and scalable web crawler

Trafilatura ⭐ 2,447

Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments

Gecco ⭐ 2,403

Easy to use lightweight web crawler（易用的轻量化网络爬虫）

Web Scraping Framework

Gospider ⭐ 2,190

Gospider - Fast web spider written in Go

Cross Platform C# web crawler framework built for speed and flexibility. Please star this project! +1.

Pspider ⭐ 1,675

简单易用的Python爬虫框架，QQ交流群：597510560

The complete web scraping toolkit for PHP.

Lightnovel Crawler ⭐ 1,185

Generate and download e-books from online sources.

Storm Crawler ⭐ 834

A scalable, mature and versatile web crawler based on Apache Storm

A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.

DataHen Till is a companion tool to your existing web scraper that instantly makes it scalable, maintainable, and more unblockable, with minimal code changes on your scraper. Integrates with any scraper in 5 minutes.

Fetchbot ⭐ 758

A simple and flexible web crawler that follows the robots.txt policies and crawl delays.

Browsertrix Crawler ⭐ 470

Run a high-fidelity browser-based crawler in a single Docker container

Scrapple ⭐ 452

A framework for creating semi-automatic web content extractors

Spidersuite ⭐ 447

Advance web spider/crawler for cyber security professionals

Pulsarrpa ⭐ 413

Automate webpages at scale, scrape web data completely and accurately with high performance, distributed RPA.

Simple, but smart, multi-threaded web crawler for randomly gathering huge lists of unique domain names.

Sparkler ⭐ 401

Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.

dude uncomplicated data extraction: A simple framework for writing web scrapers using Python decorators

Hquery.php ⭐ 345

An extremely fast web scraper that parses megabytes of invalid HTML in a blink of an eye. PHP5.3+, no dependencies.

Archivebot ⭐ 328

ArchiveBot, an IRC bot for archiving websites

Supercrawler ⭐ 324

A web crawler. Supercrawler automatically crawls websites. Define custom handlers to parse content. Obeys robots.txt, rate limits and concurrency limits.

Be nice on the web

The simple, easy to use command line web crawler.

Crawler ⭐ 285

Library for Rapid (Web) Crawler and Scraper Development

Web Scraping ⭐ 281

Más de 50 ejemplos de web scraping utilizando: Requests | Scrapy | Selenium | LXML | BeautifulSoup

[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn

Youtube Projects ⭐ 272

This repository contains all the code I use in my YouTube tutorials.

A web crawler for Go

Lagoujob ⭐ 250

Job data mining repo for lagou.com

Football Data Collection ⭐ 246

Web Scraper used to create Kaggle European Soccer database

Rcrawler ⭐ 240

An R web crawler and scraper

Laravel ⭐ 238

Laravel adapter for Roach, the complete web scraping toolkit for PHP.

Công cụ quét và phân tích từ khoá các trang báo mạng Việt Nam

Nudecrawler ⭐ 231

Crawl telegra.ph searching for nudes!

News Crawl ⭐ 229

News crawling with StormCrawler - stores content as WARC

Infinitycrawler ⭐ 221

A simple but powerful web crawler library for .NET

Wayback Machine Scraper ⭐ 219

A command-line utility and Scrapy middleware for scraping time series data from Archive.org's Wayback Machine.

Crawler Commons ⭐ 217

A set of reusable Java components that implement functionality common to any web crawler

Selenops ⭐ 215

A Swift Web Crawler 🕷

Make a ZIM file from any Web site and surf offline!

Crawley ⭐ 208

The unix-way web crawler

Strong Web Crawler ⭐ 204

基于C#.NET+PhantomJS+Sellenium的高级网络爬虫程序。可执行Javascrip

Portia Dashboard ⭐ 190

portia-dashboard is a visual web crawler based on scrapinghub/portia

Crawlab Lite ⭐ 184

Lite version of Crawlab. 轻量版 Crawlab 爬虫管理平台

Digger is a powerful and flexible web crawler implemented by pure golang

Zhihu Crawler People ⭐ 179

A simple distributed crawler for zhihu && data analysis

Antch, a fast, powerful and extensible web crawling & scraping framework for Go

Crawler_shopee_public ⭐ 169

蝦皮非同步爬蟲 + 競品賣家分析

Collector Http ⭐ 162

Norconex Web Crawler (or spider) is a flexible web crawler for collecting, parsing, and manipulating data from the Internet (or Intranet) to various data repositories such as search engines.

Cocrawler ⭐ 159

CoCrawler is a versatile web crawler built using modern tools and concurrency.

Projeto de calculo de Imposto de Renda em operacoes na bovespa automaticamente. Tags:canal eletronico do investidor, CEI, selenium, bovespa, IRPF, IR, imposto de renda, finance, yahoo finance, acao, fii, etf, python, crawler, webscraping, calculadora ir

Juno_crawler ⭐ 147

Scrapy crawler to collect data on the back catalog of songs listed for sale.

Google News Scraper ⭐ 144

Lightweight scraper for Google News

estela, an elastic web scraping cluster 🕸

Scrapy Training ⭐ 141

Scrapy Training companion code

Not Your Average Web Crawler ⭐ 130

A web crawler (for bug hunting) that gathers more than you can imagine.

A simple tool for fetching usable proxies from several websites.

Php Crawler ⭐ 121

A php crawler that finds emails on the internets

Dyer is designed for reliable, flexible and fast web crawling, providing some high-level, comprehensive features without compromising speed.

Interactive CLI Web Crawler

Crawlbox ⭐ 112

Easy way to brute-force web directory.

Gflare Tk ⭐ 110

Open-Source Python Based SEO Web Crawler

Cross Platform C# Web crawler framework, headless browser, parallel crawler. Please star this project! +1.

Seleniumcrawler ⭐ 105

An example using Selenium webdrivers for python and Scrapy framework to create a web scraper to crawl an ASP site

Node Web Crawler ⭐ 104

A web scraper with a web user interface which shows scraping stats in realtime. Uses Node.JS, jQuery, socket.io and Express.

Zhihu_crawler ⭐ 100

a crawler for zhihu

Crawl Anywhere ⭐ 98

Crawl-Anywhere - Web Crawler and document processing pipeline with Solr integration.

A web crawling framework written in Kotlin

Polipus: distributed and scalable web-crawler framework

Terpene Profile Parser For Cannabis Strains ⭐ 93

Parser and database to index the terpene profile of different strains of Cannabis from online databases

Actor Scraper ⭐ 93

House of Apify Scrapers. Generic scraping actors with a simple UI to handle complex web crawling and scraping use cases.

Scrapyd Cluster On Heroku ⭐ 90

Set up free and scalable Scrapyd cluster for distributed web-crawling with just a few clicks. DEMO 👉

A platform displaying the latest software engineer job information to entry-level new graduates

Webcrawler ⭐ 86

Web crawler to download pictures from zhihu.com

Web Crawler for Crabs

Bathyscaphe ⭐ 83

Fast, highly configurable, cloud native dark web crawler.

Scrapper ⭐ 83

Web scraper with a simple REST API living in Docker and using a Headless browser and Readability.js for parsing.

Arachnid ⭐ 80

Powerful web scraping framework for Crystal

Node Search Engine ⭐ 79

Sample search engine with web crawler, built on Node.js + CouchDB + Limestone

Webscrapper ⭐ 77

Simple and powerfull all in one Telegram Bot to scrap webpages using Requests, html5lib and Beautifulsoup

Goodreadsscraper ⭐ 76

Scrape data from Goodreads using Scrapy and Selenium 📚

Scrapfly Scrapers ⭐ 76

Web scrapers for popular targets powered Scrapfly.io

Scraping Ebay ⭐ 73

Scraping Ebay's products using Scrapy Web Crawling Framework

Spotifyscraper ⭐ 72

Spotify Scraper to extract all the information from spotify, download mp3 with cover of the song

Davedavefind ⭐ 71

A simple search engine based on the web crawler developed in Udacity's CS101 course.

Related Searches

Python Crawler (4,545)

Scraper Web Crawler (1,388)

Javascript Crawler (1,142)

Crawler Spider (1,073)

Crawler Scrapy (988)

Scraper Crawler (896)

Java Crawler (807)

1-100 of 214 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.