Awesome Open Source

Programming Languages

Search results for web crawler

1,485 search results found

Scrapy ⭐ 49,918

Scrapy, a fast high-level web crawling & scraping framework for Python.

Huginn ⭐ 41,465

Create agents that monitor and act on your behalf. Your agents are standing by!

Changedetection.io ⭐ 13,943

The best and simplest free open source website change detection, website watcher, restock monitor and notification service. Restock Monitor, change detection. Designed for simplicity - Simply monitor which websites had a text change for free. Free Open source web page change detection, Website defacement monitoring, Price change notification

Crawlee ⭐ 12,106

Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.

Crawlab ⭐ 10,521

Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台，支持任何语言和框架

Spider Flow ⭐ 8,075

新一代爬虫平台，以图形化方式定义爬虫流程，不写代码即可完成爬虫。

Katana ⭐ 7,995

A next-generation crawling and spidering framework.

Awesome Web Scraping ⭐ 6,060

List of libraries, tools and APIs for web scraping and data processing.

Awesome Crawler ⭐ 5,859

A collection of awesome web crawler,spider in different languages

Ani Cli ⭐ 5,724

A cli tool to browse and play anime

Autoscraper ⭐ 5,159

A Smart, Automatic, Fast and Lightweight Web Scraper for Python

Douyin_tiktok_download_api ⭐ 4,844

🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音、快手、T

A Devtools driver for web automation and scraping

Hakrawler ⭐ 4,120

Simple, fast web crawler designed for easy, quick discovery of endpoints and assets within a web application

Helium ⭐ 4,113

Lighter web automation for Python

Node Osmosis ⭐ 4,083

Web scraper for NodeJS

Browser Fingerprinting ⭐ 3,353

Analysis of Bot Protection systems with available countermeasures 🚿. How to defeat anti-bot system 👻 and get around browser fingerprinting scripts 🕵️‍♂️ when scraping the web?

Php Curl Class ⭐ 3,208

PHP Curl Class makes it easy to send HTTP requests and integrate with web APIs

Automatic Udemy Course Enroller Get Paid Udemy Courses For Free ⭐ 3,010

Do you want to LEARN NEW STUFF for FREE? Don't worry, with the power of web-scraping and automation, this script will find the necessary Udemy coupons & enroll you for PAID UDEMY COURSES, ABSOLUTELY FREE!

Webcollector ⭐ 2,974

WebCollector is an open source web crawler framework based on Java.It provides some simple interfaces for crawling the Web,you can setup a multi-threaded web crawler in less than 5 minutes.

Nutch ⭐ 2,742

Apache Nutch is an extensible and scalable web crawler

Snoop ⭐ 2,530

Snoop — инструмент разведки на основе открытых данных (OSINT world)

Trafilatura ⭐ 2,447

Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments

Gecco ⭐ 2,403

Easy to use lightweight web crawler（易用的轻量化网络爬虫）

Web Scraping Framework

Getting started with Puppeteer and Chrome Headless for Web Scraping

Gospider ⭐ 2,190

Gospider - Fast web spider written in Go

Web Scraper in Go, similar to BeautifulSoup

Cross Platform C# web crawler framework built for speed and flexibility. Please star this project! +1.

30 Days Of Python ⭐ 1,926

Learn Python for the next 30 (or so) Days.

Pythoncode Tutorials ⭐ 1,923

The Python Code Tutorials

Pspider ⭐ 1,675

简单易用的Python爬虫框架，QQ交流群：597510560

Scrapoxy ⭐ 1,669

Scrapoxy is a super proxy aggregator, allowing you to manage all proxies in one place 🎯, rather than spreading it across multiple scrapers 🕸️. It also smartly handles traffic routing 🔀 to minimize bans and increase success rates 🚀.

General Assembly's 2015 Data Science course in Washington, DC

Tomorrow ⭐ 1,463

Magic decorator syntax for asynchronous code in Python

Rvest ⭐ 1,434

Simple web scraping for R

How To Prevent Scraping ⭐ 1,417

The ultimate guide on preventing Website Scraping

Webscraping From 0 To Hero ⭐ 1,305

The web scraping open project repository aims to share knowledge and experiences about web scraping with Python

100projectsofcode ⭐ 1,293

A list of practical knowledge-building projects.

The complete web scraping toolkit for PHP.

Scrapeghost ⭐ 1,283

👻 Experimental library for scraping websites using OpenAI's GPT API.

Requests Cache ⭐ 1,244

Persistent HTTP cache for python requests

Lightnovel Crawler ⭐ 1,185

Generate and download e-books from online sources.

Lectures ⭐ 1,176

Lecture notes for EC 607

Django Dynamic Scraper ⭐ 1,069

Creating Scrapy scrapers via the Django admin interface

Faster Than Requests ⭐ 1,061

Faster requests on Python 3

Crosslinked ⭐ 1,060

LinkedIn enumeration tool to extract valid employee names from an organization through search engine scraping

Curl_cffi ⭐ 987

Python binding for curl-impersonate via cffi. A http client that can impersonate browser tls/ja3/http2 fingerprints.

Stealth ⭐ 923

🚀 Stealth - Secure, Peer-to-Peer, Private and Automateable Web Browser/Scraper/Proxy

Selectolax ⭐ 921

Python binding to Modest and Lexbor engines (fast HTML5 parser with CSS selectors).

A configurable web spider with a easy-to-use web console

Youtube_tutorials ⭐ 889

Collection of scripts corresponding to LucidProgramming YouTube tutorials

User Agents ⭐ 871

A JavaScript library for generating random user agents with data that's updated daily.

UI.Vision: Open-Source RPA Software (formerly Kantu) - Modern Robotic Process Automation with Selenium IDE++

Storm Crawler ⭐ 834

A scalable, mature and versatile web crawler based on Apache Storm

A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.

DataHen Till is a companion tool to your existing web scraper that instantly makes it scalable, maintainable, and more unblockable, with minimal code changes on your scraper. Integrates with any scraper in 5 minutes.

Fetchbot ⭐ 758

A simple and flexible web crawler that follows the robots.txt policies and crawl delays.

Zhihu Spider ⭐ 719

A web spider for zhihu.com

Gazpacho ⭐ 716

🥫 The simple, fast, and modern web scraping library

Marginaliasearch ⭐ 711

Internet search engine for text-oriented websites. Indexing the small, old and weird web.

Suck the InTernet

Scrapy Fake Useragent ⭐ 654

Random User-Agent middleware based on fake-useragent

Coolqlcool ⭐ 626

Nextjs server to query websites with GraphQL

Command line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern matching. It can also create new or transformed XML/HTML/JSON documents.

Gogoanime Api ⭐ 575

Anime Streaming, Discovery API made with Cheerio and Express. Uses data from Gogoanime

Instascrape ⭐ 554

Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically

Scrapers ⭐ 511

A list of scrapers from around the web.

Complete Life Cycle Of A Data Science Project ⭐ 499

Complete-Life-Cycle-of-a-Data-Science-Project

Jekyll-based static site for The Programming Historian

Facepager ⭐ 490

Facepager was made for fetching public available data from YouTube, Twitter and other websites on the basis of APIs and webscraping.

Phpscraper ⭐ 486

A universal web-util for PHP.

Browsertrix Crawler ⭐ 470

Run a high-fidelity browser-based crawler in a single Docker container

Company Crawler ⭐ 466

天眼查爬虫&企查查爬虫，指定关键字爬取公司信息

Take the hassle out of web scraping

Scrapple ⭐ 452

A framework for creating semi-automatic web content extractors

Spidersuite ⭐ 447

Advance web spider/crawler for cyber security professionals

Wereadscan ⭐ 447

扫描“微信读书”已购图书并下载本地PDF的爬虫

ACHE is a web crawler for domain-specific search.

Google Search Results Python ⭐ 432

Google Search Results via SERP API pip Python Package

Nytimes App ⭐ 429

🗽 A Simple Demonstration of the New York Times App 📱 using Jsoup web crawler with MVVM Architecture 🔥

Monkey Dl ⭐ 428

Bulk download your favourite anime episodes from your favourite anime websites

Pulsarrpa ⭐ 413

Automate webpages at scale, scrape web data completely and accurately with high performance, distributed RPA.

Simple, but smart, multi-threaded web crawler for randomly gathering huge lists of unique domain names.

Sparkler ⭐ 401

Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.

dude uncomplicated data extraction: A simple framework for writing web scrapers using Python decorators

Opensource Korean chatbot framework

Basketball_reference_web_scraper ⭐ 382

NBA Stats API via Basketball Reference

Proxy_requests ⭐ 381

a class that uses scraped proxies to make http GET/POST requests (Python requests)

Tarsier ⭐ 372

Vision utilities for web interaction agents 👀

Learn To Identify Similar Images ⭐ 358

Record my python script about Iearning to identify similar images

Finvizfinance ⭐ 355

Finviz analysis python library.

Http Proxy List ⭐ 355

It is a lightweight project that, every 10 minutes, scrapes lots of free-proxy sites, validates if it works, and serves a clean proxy list.

Scrape Linkedin Selenium ⭐ 353

`scrape_linkedin` is a python package that allows you to scrape personal LinkedIn profiles & company pages - turning the data into structured json.

Hquery.php ⭐ 345

An extremely fast web scraper that parses megabytes of invalid HTML in a blink of an eye. PHP5.3+, no dependencies.

R Web Scraping Cheat Sheet ⭐ 339

Guide, reference and cheatsheet on web scraping using rvest, httr and Rselenium.

Google Maps Scraper ⭐ 330

scrape data data from Google Maps. Extracts data such as the name, address, phone number, website URL, rating, reviews number, latitude and longitude, reviews,email and more for each place

Archivebot ⭐ 328

ArchiveBot, an IRC bot for archiving websites

Supercrawler ⭐ 324

A web crawler. Supercrawler automatically crawls websites. Define custom handlers to parse content. Obeys robots.txt, rate limits and concurrency limits.

Social Media Profile Scrapers ⭐ 322

Fetch user's data across social media

Related Searches

Scraper Web Crawler (1,388)

1-100 of 1,485 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.