Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for web crawler
web-crawler
x
1,485 search results found
Scrapy
⭐
49,918
Scrapy, a fast high-level web crawling & scraping framework for Python.
Huginn
⭐
41,465
Create agents that monitor and act on your behalf. Your agents are standing by!
Changedetection.io
⭐
13,943
The best and simplest free open source website change detection, website watcher, restock monitor and notification service. Restock Monitor, change detection. Designed for simplicity - Simply monitor which websites had a text change for free. Free Open source web page change detection, Website defacement monitoring, Price change notification
Crawlee
⭐
12,106
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.
Crawlab
⭐
10,521
Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架
Spider Flow
⭐
8,075
新一代爬虫平台,以图形化方式定义爬虫流程,不写代码即可完成爬虫。
Katana
⭐
7,995
A next-generation crawling and spidering framework.
Awesome Web Scraping
⭐
6,060
List of libraries, tools and APIs for web scraping and data processing.
Awesome Crawler
⭐
5,859
A collection of awesome web crawler,spider in different languages
Ani Cli
⭐
5,724
A cli tool to browse and play anime
Autoscraper
⭐
5,159
A Smart, Automatic, Fast and Lightweight Web Scraper for Python
Douyin_tiktok_download_api
⭐
4,844
🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音、快手、T
Rod
⭐
4,505
A Devtools driver for web automation and scraping
Hakrawler
⭐
4,120
Simple, fast web crawler designed for easy, quick discovery of endpoints and assets within a web application
Helium
⭐
4,113
Lighter web automation for Python
Node Osmosis
⭐
4,083
Web scraper for NodeJS
Browser Fingerprinting
⭐
3,353
Analysis of Bot Protection systems with available countermeasures 🚿. How to defeat anti-bot system 👻 and get around browser fingerprinting scripts 🕵️♂️ when scraping the web?
Php Curl Class
⭐
3,208
PHP Curl Class makes it easy to send HTTP requests and integrate with web APIs
Automatic Udemy Course Enroller Get Paid Udemy Courses For Free
⭐
3,010
Do you want to LEARN NEW STUFF for FREE? Don't worry, with the power of web-scraping and automation, this script will find the necessary Udemy coupons & enroll you for PAID UDEMY COURSES, ABSOLUTELY FREE!
Webcollector
⭐
2,974
WebCollector is an open source web crawler framework based on Java.It provides some simple interfaces for crawling the Web,you can setup a multi-threaded web crawler in less than 5 minutes.
Nutch
⭐
2,742
Apache Nutch is an extensible and scalable web crawler
Snoop
⭐
2,530
Snoop — инструмент разведки на основе открытых данных (OSINT world)
Trafilatura
⭐
2,447
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
Gecco
⭐
2,403
Easy to use lightweight web crawler(易用的轻量化网络爬虫)
Grab
⭐
2,292
Web Scraping Framework
Thal
⭐
2,268
Getting started with Puppeteer and Chrome Headless for Web Scraping
Gospider
⭐
2,190
Gospider - Fast web spider written in Go
Soup
⭐
2,074
Web Scraper in Go, similar to BeautifulSoup
Abot
⭐
1,991
Cross Platform C# web crawler framework built for speed and flexibility. Please star this project! +1.
30 Days Of Python
⭐
1,926
Learn Python for the next 30 (or so) Days.
Pythoncode Tutorials
⭐
1,923
The Python Code Tutorials
Pspider
⭐
1,675
简单易用的Python爬虫框架,QQ交流群:597510560
Scrapoxy
⭐
1,669
Scrapoxy is a super proxy aggregator, allowing you to manage all proxies in one place 🎯, rather than spreading it across multiple scrapers 🕸️. It also smartly handles traffic routing 🔀 to minimize bans and increase success rates 🚀.
Dat8
⭐
1,549
General Assembly's 2015 Data Science course in Washington, DC
Tomorrow
⭐
1,463
Magic decorator syntax for asynchronous code in Python
Rvest
⭐
1,434
Simple web scraping for R
How To Prevent Scraping
⭐
1,417
The ultimate guide on preventing Website Scraping
Webscraping From 0 To Hero
⭐
1,305
The web scraping open project repository aims to share knowledge and experiences about web scraping with Python
100projectsofcode
⭐
1,293
A list of practical knowledge-building projects.
Core
⭐
1,290
The complete web scraping toolkit for PHP.
Scrapeghost
⭐
1,283
👻 Experimental library for scraping websites using OpenAI's GPT API.
Requests Cache
⭐
1,244
Persistent HTTP cache for python requests
Lightnovel Crawler
⭐
1,185
Generate and download e-books from online sources.
Lectures
⭐
1,176
Lecture notes for EC 607
Django Dynamic Scraper
⭐
1,069
Creating Scrapy scrapers via the Django admin interface
Faster Than Requests
⭐
1,061
Faster requests on Python 3
Crosslinked
⭐
1,060
LinkedIn enumeration tool to extract valid employee names from an organization through search engine scraping
Curl_cffi
⭐
987
Python binding for curl-impersonate via cffi. A http client that can impersonate browser tls/ja3/http2 fingerprints.
Stealth
⭐
923
🚀 Stealth - Secure, Peer-to-Peer, Private and Automateable Web Browser/Scraper/Proxy
Selectolax
⭐
921
Python binding to Modest and Lexbor engines (fast HTML5 parser with CSS selectors).
Spider
⭐
907
A configurable web spider with a easy-to-use web console
Youtube_tutorials
⭐
889
Collection of scripts corresponding to LucidProgramming YouTube tutorials
User Agents
⭐
871
A JavaScript library for generating random user agents with data that's updated daily.
Rpa
⭐
858
UI.Vision: Open-Source RPA Software (formerly Kantu) - Modern Robotic Process Automation with Selenium IDE++
Storm Crawler
⭐
834
A scalable, mature and versatile web crawler based on Apache Storm
Spidr
⭐
775
A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.
Till
⭐
770
DataHen Till is a companion tool to your existing web scraper that instantly makes it scalable, maintainable, and more unblockable, with minimal code changes on your scraper. Integrates with any scraper in 5 minutes.
Fetchbot
⭐
758
A simple and flexible web crawler that follows the robots.txt policies and crawl delays.
Zhihu Spider
⭐
719
A web spider for zhihu.com
Gazpacho
⭐
716
🥫 The simple, fast, and modern web scraping library
Marginaliasearch
⭐
711
Internet search engine for text-oriented websites. Indexing the small, old and weird web.
Suckit
⭐
669
Suck the InTernet
Scrapy Fake Useragent
⭐
654
Random User-Agent middleware based on fake-useragent
Coolqlcool
⭐
626
Nextjs server to query websites with GraphQL
Xidel
⭐
611
Command line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern matching. It can also create new or transformed XML/HTML/JSON documents.
Gogoanime Api
⭐
575
Anime Streaming, Discovery API made with Cheerio and Express. Uses data from Gogoanime
Instascrape
⭐
554
Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically
Scrapers
⭐
511
A list of scrapers from around the web.
Complete Life Cycle Of A Data Science Project
⭐
499
Complete-Life-Cycle-of-a-Data-Science-Project
Jekyll
⭐
498
Jekyll-based static site for The Programming Historian
Facepager
⭐
490
Facepager was made for fetching public available data from YouTube, Twitter and other websites on the basis of APIs and webscraping.
Phpscraper
⭐
486
A universal web-util for PHP.
Browsertrix Crawler
⭐
470
Run a high-fidelity browser-based crawler in a single Docker container
Company Crawler
⭐
466
天眼查爬虫&企查查爬虫,指定关键字爬取公司信息
Morph
⭐
454
Take the hassle out of web scraping
Scrapple
⭐
452
A framework for creating semi-automatic web content extractors
Spidersuite
⭐
447
Advance web spider/crawler for cyber security professionals
Wereadscan
⭐
447
扫描“微信读书”已购图书并下载本地PDF的爬虫
Ache
⭐
433
ACHE is a web crawler for domain-specific search.
Google Search Results Python
⭐
432
Google Search Results via SERP API pip Python Package
Nytimes App
⭐
429
🗽 A Simple Demonstration of the New York Times App 📱 using Jsoup web crawler with MVVM Architecture 🔥
Monkey Dl
⭐
428
Bulk download your favourite anime episodes from your favourite anime websites
Pulsarrpa
⭐
413
Automate webpages at scale, scrape web data completely and accurately with high performance, distributed RPA.
Dcrawl
⭐
411
Simple, but smart, multi-threaded web crawler for randomly gathering huge lists of unique domain names.
Sparkler
⭐
401
Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
Dude
⭐
397
dude uncomplicated data extraction: A simple framework for writing web scrapers using Python decorators
Kochat
⭐
383
Opensource Korean chatbot framework
Basketball_reference_web_scraper
⭐
382
NBA Stats API via Basketball Reference
Proxy_requests
⭐
381
a class that uses scraped proxies to make http GET/POST requests (Python requests)
Tarsier
⭐
372
Vision utilities for web interaction agents 👀
Learn To Identify Similar Images
⭐
358
Record my python script about Iearning to identify similar images
Finvizfinance
⭐
355
Finviz analysis python library.
Http Proxy List
⭐
355
It is a lightweight project that, every 10 minutes, scrapes lots of free-proxy sites, validates if it works, and serves a clean proxy list.
Scrape Linkedin Selenium
⭐
353
`scrape_linkedin` is a python package that allows you to scrape personal LinkedIn profiles & company pages - turning the data into structured json.
Hquery.php
⭐
345
An extremely fast web scraper that parses megabytes of invalid HTML in a blink of an eye. PHP5.3+, no dependencies.
R Web Scraping Cheat Sheet
⭐
339
Guide, reference and cheatsheet on web scraping using rvest, httr and Rselenium.
Google Maps Scraper
⭐
330
scrape data data from Google Maps. Extracts data such as the name, address, phone number, website URL, rating, reviews number, latitude and longitude, reviews,email and more for each place
Archivebot
⭐
328
ArchiveBot, an IRC bot for archiving websites
Supercrawler
⭐
324
A web crawler. Supercrawler automatically crawls websites. Define custom handlers to parse content. Obeys robots.txt, rate limits and concurrency limits.
Social Media Profile Scrapers
⭐
322
Fetch user's data across social media
Related Searches
Scraper Web Crawler (1,388)
1-100 of 1,485 search results
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.