Awesome Open Source

Programming Languages

Search results for scraper

4,239 search results found

Scrapy ⭐ 49,918

Scrapy, a fast high-level web crawling & scraping framework for Python.

Huginn ⭐ 41,465

Create agents that monitor and act on your behalf. Your agents are standing by!

Devdocs ⭐ 33,315

API Documentation Browser

Cheerio ⭐ 27,702

The fast, flexible, and elegant library for parsing and manipulating HTML and XML.

👾 Fast and simple video download library and CLI tool written in Go

Colly ⭐ 21,902

Elegant Scraper and Crawler Framework for Golang

Easyspider ⭐ 20,149

A visual no-code/code-free web crawler/spider易采集：一个可视化浏览器自动化测试/数据采集/爬虫软件，可以无代码图形化

Newspaper ⭐ 13,147

News, full-text, and article metadata extraction in Python 3. Advanced docs:

Requests Html ⭐ 13,100

Pythonic HTML Parsing for Humans™

Crawlee ⭐ 12,106

Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.

Webmagic ⭐ 11,080

A scalable web crawler framework for Java.

Jsoup ⭐ 10,463

jsoup: the Java HTML parser, built for HTML editing, cleaning, scraping, and XSS safety.

Chinese Xinhua ⭐ 10,425

📙 中华新华字典数据库。包括歇后语，成语，词语，汉字。

Portia ⭐ 8,982

Visual scraping for Scrapy

Avbook ⭐ 8,777

AV 电影管理系统， avmoo , javbus , javlibrary 爬虫，线上 AV 影片图书馆，AV 磁力链接数据库，Japanese Adult Video Library,Adult Video Magnet Links - Japanese Adult Video Database

Undetected Chromedriver ⭐ 7,232

Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)

Tabula ⭐ 6,488

Tabula is a tool for liberating data tables trapped inside PDF files

Awesome Web Scraping ⭐ 6,060

List of libraries, tools and APIs for web scraping and data processing.

Awesome Crawler ⭐ 5,859

A collection of awesome web crawler,spider in different languages

Ferret ⭐ 5,540

Declarative web scraping

Autoscraper ⭐ 5,159

A Smart, Automatic, Fast and Lightweight Web Scraper for Python

Headless Chrome Crawler ⭐ 5,051

Distributed crawler powered by Headless Chrome

Douyin_tiktok_download_api ⭐ 4,844

🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音、快手、T

A Devtools driver for web automation and scraping

Mechanize ⭐ 4,338

Mechanize is a ruby library that makes automated web interaction easy.

Mygptreader ⭐ 4,267

A community-driven way to read and chat with AI bots - powered by chatGPT.

Python Scraping ⭐ 4,136

Code samples from the book Web Scraping with Python http://shop.oreilly.com/product/0636920034391.do

Node Ytdl Core ⭐ 4,087

YouTube video downloader in javascript.

Node Osmosis ⭐ 4,083

Web scraper for NodeJS

Snscrape ⭐ 3,992

A social networking service scraper in Python

Scrape It ⭐ 3,978

🔮 A Node.js scraper for humans.

Data Science ⭐ 3,898

Collection of useful data science topics along with articles, videos, and code

Scraperjs ⭐ 3,575

A complete and versatile web scraper.

Tiktok Scraper ⭐ 3,554

TikTok Scraper. Download video posts, collect user/trend/hashtag/music feed metadata, sign URL and etc.

Fake Useragent ⭐ 3,356

Up-to-date simple useragent faker with real world database

Browser Fingerprinting ⭐ 3,353

Analysis of Bot Protection systems with available countermeasures 🚿. How to defeat anti-bot system 👻 and get around browser fingerprinting scripts 🕵️‍♂️ when scraping the web?

Automatic Udemy Course Enroller Get Paid Udemy Courses For Free ⭐ 3,010

Do you want to LEARN NEW STUFF for FREE? Don't worry, with the power of web-scraping and automation, this script will find the necessary Udemy coupons & enroll you for PAID UDEMY COURSES, ABSOLUTELY FREE!

Python ⭐ 2,978

Python Books && Courses

Instagram Php Scraper ⭐ 2,928

Get account information, photos, videos, stories and comments.

Tweets_analyzer ⭐ 2,894

Tweets metadata scraper & activity analyzer

Panther ⭐ 2,849

A browser testing and web crawling library for PHP and Symfony

Emby.plugins.javscraper ⭐ 2,687

Emby/Jellyfin 的一个日本电影刮削器插件，可以从某些网站抓取影片信息。

Querylist ⭐ 2,598

🕷️ The progressive PHP crawler framework! 优雅的渐进式PHP采集框架。

Googlescraper ⭐ 2,540

A Python module to scrape several search engines (like Google, Yandex, Bing, Duckduckgo, ...). Including asynchronous networking support.

Snoop ⭐ 2,530

Snoop — инструмент разведки на основе открытых данных (OSINT world)

Aos Avp ⭐ 2,515

NOVA opeN sOurce Video plAyer: main repository to build them all

Trafilatura ⭐ 2,447

Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments

Web Scraping Framework

Getting started with Puppeteer and Chrome Headless for Web Scraping

Weibo_terminater ⭐ 2,265

Final Weibo Crawler Scrap Anything From Weibo, comments, weibo contents, followers, anything. The Terminator

Awesome Puppeteer ⭐ 2,245

A curated list of awesome puppeteer resources.

Bulk Downloader For Reddit ⭐ 2,142

Downloads and archives content from reddit

Freedictionaryapi ⭐ 2,115

There was no free Dictionary API on the web when I wanted one for my friend, so I created one.

Google Play Scraper ⭐ 2,108

Node.js scraper to get data from Google Play

Web Scraper in Go, similar to BeautifulSoup

Embed ⭐ 2,052

Get info from any web service or page

Facebook Page Post Scraper ⭐ 2,014

Data scraper for Facebook Pages, and also code accompanying the blog post How to Scrape Data From Facebook Page Posts for Statistical Analysis

Facebook Scraper ⭐ 1,936

Scrape Facebook public pages without an API key

Geziyor ⭐ 1,892

Geziyor, blazing fast web crawling & scraping framework for Go. Supports JS rendering.

Twitterscraper ⭐ 1,852

Scrape Twitter for Tweets

Node.io ⭐ 1,845

Nyt 2020 Election Scraper ⭐ 1,788

Content ⭐ 1,711

Official content for Harvard CS109

Scrapely ⭐ 1,668

A pure-python HTML screen-scraping library

Scraper ⭐ 1,639

HTML parsing and querying with CSS selectors

Upton ⭐ 1,615

A batteries-included framework for easy web-scraping. Just add CSS! (Or do more.)

Linkedin_scraper ⭐ 1,534

A library that scrapes Linkedin for user data

Jobfunnel ⭐ 1,533

Scrape job websites into a single spreadsheet with no duplicates.

Scrape ⭐ 1,464

A simple, higher level interface for Go web scraping.

Tomorrow ⭐ 1,463

Magic decorator syntax for asynchronous code in Python

Node Website Scraper ⭐ 1,456

Download website to local directory (including all css, images, js, etc.)

Rvest ⭐ 1,434

Simple web scraping for R

Snmp_exporter ⭐ 1,433

SNMP Exporter for Prometheus

How To Prevent Scraping ⭐ 1,417

The ultimate guide on preventing Website Scraping

Recipe Scrapers ⭐ 1,408

Python package for scraping recipes data

Article Extractor ⭐ 1,297

To extract main article from given URL with Node.js

Wombat ⭐ 1,297

Lightweight Ruby web crawler/scraper with an elegant DSL which extracts structured data from pages.

Jd Autobuy ⭐ 1,291

Python爬虫，京东自动登录，在线抢购商品

Shot Scraper ⭐ 1,285

A command-line utility for taking automated screenshots of websites

Cloudproxy ⭐ 1,235

Hide your scrapers IP behind the cloud. Provision proxy servers across different cloud providers to improve your scraping success.

Cariddi ⭐ 1,228

Take a list of domains, crawl urls and scan for endpoints, secrets, api keys, file extensions, tokens and more

DOM Traversing and Scraping using GraphQL

Cinemagoer ⭐ 1,156

Cinemagoer is a Python package useful to retrieve and manage the data of the IMDb (to which we are not affiliated in any way) movie database about movies, people, characters and companies

Informer ⭐ 1,141

A Telegram Mass Surveillance Bot in Python

Scrapy Cluster ⭐ 1,137

This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.

Event Collect ⭐ 1,110

event website listing to Open Event format scraper and converter

Animdl ⭐ 1,105

A highly efficient, fast, powerful and light-weight anime downloader and streamer for your favorite anime.

Fansly Downloader ⭐ 1,103

Easy to use fansly.com content downloading tool. Written in python, but ships as a standalone Executable App for Windows too. Enjoy your Fansly content offline anytime, anywhere in the highest possible content resolution! Fully customizable to download in bulk or single: photos, videos & audio from timeline, messages, collection & specific posts 👍

Loklak_scraper_js ⭐ 1,094

Scrapers for loklak in javascript

Newpipeextractor ⭐ 1,070

NewPipe's core library for extracting data from streaming sites

Django Dynamic Scraper ⭐ 1,069

Creating Scrapy scrapers via the Django admin interface

Scanless ⭐ 1,061

online port scan scraper

Parliament Scraper ⭐ 1,049

Public Data Scraper for Parliament Data for the EU and other Parliaments

Crawler User Agents ⭐ 1,045

Syntactic patterns of HTTP user-agents used by bots / robots / crawlers / scrapers / spiders. pull-request welcome ⭐

Redditdownloader ⭐ 1,045

Scrapes Reddit to download media of your choice.

Osi.ig ⭐ 1,027

Information Gathering Instagram.

Artoo ⭐ 1,024

artoo.js - the client-side scraping companion.

Parsel ⭐ 1,010

Parsel lets you extract data from XML/HTML documents using XPath or CSS selectors

Pjscrape ⭐ 1,003

A web-scraping framework written in Javascript, using PhantomJS and jQuery

📖 The most advanced (yet simple) cli manga downloader in the entire universe! Lua scrapers, export formats, anilist integration, fancy TUI and more!

Related Searches

Python Scraper (5,696)

Javascript Scraper (2,047)

Scraper Scrape (1,534)

Scraper Web Crawler (1,528)

Scraper Crawler (904)

Html Scraper (757)

1-100 of 4,239 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.