Awesome Open Source

Programming Languages

Search results for scraper web crawler

39 search results found

Scrapy ⭐ 49,918

Scrapy, a fast high-level web crawling & scraping framework for Python.

Huginn ⭐ 41,465

Create agents that monitor and act on your behalf. Your agents are standing by!

Crawlee ⭐ 12,106

Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.

Awesome Web Scraping ⭐ 6,060

List of libraries, tools and APIs for web scraping and data processing.

Awesome Crawler ⭐ 5,859

A collection of awesome web crawler,spider in different languages

Autoscraper ⭐ 5,159

A Smart, Automatic, Fast and Lightweight Web Scraper for Python

Douyin_tiktok_download_api ⭐ 4,844

🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音、快手、T

A Devtools driver for web automation and scraping

Node Osmosis ⭐ 4,083

Web scraper for NodeJS

Browser Fingerprinting ⭐ 3,353

Analysis of Bot Protection systems with available countermeasures 🚿. How to defeat anti-bot system 👻 and get around browser fingerprinting scripts 🕵️‍♂️ when scraping the web?

Automatic Udemy Course Enroller Get Paid Udemy Courses For Free ⭐ 3,010

Do you want to LEARN NEW STUFF for FREE? Don't worry, with the power of web-scraping and automation, this script will find the necessary Udemy coupons & enroll you for PAID UDEMY COURSES, ABSOLUTELY FREE!

Snoop ⭐ 2,530

Snoop — инструмент разведки на основе открытых данных (OSINT world)

Trafilatura ⭐ 2,447

Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments

Web Scraping Framework

Getting started with Puppeteer and Chrome Headless for Web Scraping

Web Scraper in Go, similar to BeautifulSoup

Tomorrow ⭐ 1,463

Magic decorator syntax for asynchronous code in Python

Rvest ⭐ 1,434

Simple web scraping for R

How To Prevent Scraping ⭐ 1,417

The ultimate guide on preventing Website Scraping

Django Dynamic Scraper ⭐ 1,069

Creating Scrapy scrapers via the Django admin interface

Stealth ⭐ 923

🚀 Stealth - Secure, Peer-to-Peer, Private and Automateable Web Browser/Scraper/Proxy

A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.

DataHen Till is a companion tool to your existing web scraper that instantly makes it scalable, maintainable, and more unblockable, with minimal code changes on your scraper. Integrates with any scraper in 5 minutes.

Gazpacho ⭐ 716

🥫 The simple, fast, and modern web scraping library

Command line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern matching. It can also create new or transformed XML/HTML/JSON documents.

Gogoanime Api ⭐ 575

Anime Streaming, Discovery API made with Cheerio and Express. Uses data from Gogoanime

Instascrape ⭐ 554

Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically

Scrapers ⭐ 511

A list of scrapers from around the web.

Complete Life Cycle Of A Data Science Project ⭐ 499

Complete-Life-Cycle-of-a-Data-Science-Project

Jekyll-based static site for The Programming Historian

Phpscraper ⭐ 486

A universal web-util for PHP.

Take the hassle out of web scraping

Scrapple ⭐ 452

A framework for creating semi-automatic web content extractors

Google Search Results Python ⭐ 432

Google Search Results via SERP API pip Python Package

Pulsarrpa ⭐ 413

Automate webpages at scale, scrape web data completely and accurately with high performance, distributed RPA.

dude uncomplicated data extraction: A simple framework for writing web scrapers using Python decorators

Basketball_reference_web_scraper ⭐ 382

NBA Stats API via Basketball Reference

Scrape Linkedin Selenium ⭐ 353

`scrape_linkedin` is a python package that allows you to scrape personal LinkedIn profiles & company pages - turning the data into structured json.

Hquery.php ⭐ 345

An extremely fast web scraper that parses megabytes of invalid HTML in a blink of an eye. PHP5.3+, no dependencies.

R Web Scraping Cheat Sheet ⭐ 339

Guide, reference and cheatsheet on web scraping using rvest, httr and Rselenium.

City Scrapers ⭐ 315

Scrape, standardize and share public meetings from local government websites

Mov Cli ⭐ 314

A cli tool to browse and watch Movies/Shows/TV/Sports.

Be nice on the web

Crawler ⭐ 285

Library for Rapid (Web) Crawler and Scraper Development

Pricewise ⭐ 284

Dive into web scraping and build a Next.js 13 eCommerce price tracker within a single video that teaches you data scraping, cron jobs, sending emails, deployment, and more.

[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn

Web Scraping ⭐ 281

Más de 50 ejemplos de web scraping utilizando: Requests | Scrapy | Selenium | LXML | BeautifulSoup

Web Scraping ⭐ 276

Detailed web scraping tutorials for dummies with financial data crawlers on Reddit WallStreetBets, CME (both options and futures), US Treasury, CFTC, LME, MacroTrends, SHFE and alternative data crawlers on Tomtom, BBC, Wall Street Journal, Al Jazeera, Reuters, Financial Times, Bloomberg, CNN, Fortune, The Economist

Youtube Projects ⭐ 272

This repository contains all the code I use in my YouTube tutorials.

A web crawler for Go

Dark Fantasy Hack Tool ⭐ 248

DDOS Tool: To take down small websites with HTTP FLOOD. Port scanner: To know the open ports of a site. FTP Password Cracker: To hack file system of websites.. Banner Grabber: To get the service or software running on a port. (After knowing the software running google for its vulnerabilities.) Web Spider: For gathering web application hacking information. Email scraper: To get all emails related to a webpage IMDB Rating: Easy way to access the movie database. Both .exe(compressed as zip) and .py

Rcrawler ⭐ 240

An R web crawler and scraper

Summarizer ⭐ 236

A Reddit bot that summarizes news articles written in Spanish or English. It uses a custom built algorithm to rank words and sentences.

A simple browser/client-side web scraper.

Nudecrawler ⭐ 231

Crawl telegra.ph searching for nudes!

Fb_friend_list_scraper ⭐ 226

OSINT tool to scrape names and usernames from large friend lists on Facebook, without being rate limited.

Amazon Scraper ⭐ 219

A simple web scraper to extract Product Data and Pricing from Amazon

Wayback Machine Scraper ⭐ 219

A command-line utility and Scrapy middleware for scraping time series data from Archive.org's Wayback Machine.

Make a ZIM file from any Web site and surf offline!

Humanoid ⭐ 191

Node.js package to bypass CloudFlare's anti-bot JavaScript challenges

Letterboxd_recommendations ⭐ 190

Scraping publicly-accessible Letterboxd data and creating a movie recommendation model with it that can generate recommendations when provided with a Letterboxd username

Open Source web scraping API. Falkor turns web pages into queryable JSON

Daath Ai Parser ⭐ 184

Daath AI Parser is an open-source application that uses OpenAI to parse visible text of HTML elements.

Antch, a fast, powerful and extensible web crawling & scraping framework for Go

Ayakashi ⭐ 177

⚡ Ayakashi.io - The next generation web scraping framework

Trump Lies ⭐ 175

Tutorial: Web scraping in Python with Beautiful Soup

Scraping With Rust ⭐ 174

👾 scraping hacker news with rust

Goscrape ⭐ 172

Web scraper that can create an offline readable version of a website

A Chrome extension for writing custom web scraping programs and web automation programs. Just demonstrate how to collect the first row of data, then let the extension write the program for collecting all rows.

Screenslicer ⭐ 159

Automatic, zero-config web scraping -- written in Java, has no dependency on Java EE or app servers, and the web scraper has a restful/JSON API. Currently unmaintained.

Facebook_page_scraper ⭐ 150

Scrapes facebook's pages front end with no limitations & provides a feature to turn data into structured JSON or CSV

Web Scraper Chrome Extension ⭐ 149

Web data extraction tool implemented as chrome extension

Simple Query Scraping with CSS and Go Reflection (MOVED to Gitlab)

Google News Scraper ⭐ 144

Lightweight scraper for Google News

Web Database Analytics ⭐ 144

Web scrapping and related analytics using Python tools

Saveddit ⭐ 143

Bulk Downloader for Reddit

estela, an elastic web scraping cluster 🕸

Amazon Scraper ⭐ 140

Free Trial Amazon Scraper API for extracting search, product, offer listing, reviews, question and answers, best sellers and sellers data.

Hockey Scraper ⭐ 134

Python Package for scraping NHL Play-by-Play and Shift data

Not Your Average Web Crawler ⭐ 130

A web crawler (for bug hunting) that gathers more than you can imagine.

Zillow Scraper for Python using Selenium

Scrapers ⭐ 128

Lots and lots of web scrapers

Interactive CLI Web Crawler

Html Metadata ⭐ 115

MetaData html scraper and parser for Node.js (supports Promises and callback style)

Raspagem De Dados Para Iniciantes ⭐ 115

Raspagem de dados para iniciante usando Scrapy e outras libs básicas

Homeharvest ⭐ 114

Python package for real estate scraping of MLS listing data

Gflare Tk ⭐ 110

Open-Source Python Based SEO Web Crawler

Seleniumcrawler ⭐ 105

An example using Selenium webdrivers for python and Scrapy framework to create a web scraper to crawl an ASP site

Node Web Crawler ⭐ 104

A web scraper with a web user interface which shows scraping stats in realtime. Uses Node.JS, jQuery, socket.io and Express.

Scraper ⭐ 104

Web scraper for scraping, tracking and visualizing prices of products on various websites.

Code for the second edition Web Scraping with Python book by Packt Publications

Brutescrape ⭐ 95

A web scraper for generating password files based on plain text found

Actor Scraper ⭐ 93

House of Apify Scrapers. Generic scraping actors with a simple UI to handle complex web crawling and scraping use cases.

Rymscraper ⭐ 92

Python API to extract data from rateyourmusic.com.

Bing Ip2hosts ⭐ 91

bingip2hosts is a Bing.com web scraper that discovers websites by IP address

AnimeEZ - An Anime Streaming website without any ads for free (Demo - https://animeez.live) BTW ITS MADE IN HTML

Retrieve information from BoxRec and return it in JSON format

A platform displaying the latest software engineer job information to entry-level new graduates

Jsongenius ⭐ 85

Get structured JSON data from any page.

Web Crawler for Crabs

Related Searches

Python Scraper (3,513)

Python Web Crawler (2,384)

Javascript Scraper (2,076)

Scraper Scrape (1,534)

Scraper Crawler (904)

Html Scraper (757)

1-39 of 39 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.