Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for python scraper
python
x
scraper
x
3,122 search results found
Scrapy
⭐
47,417
Scrapy, a fast high-level web crawling & scraping framework for Python.
Requests Html
⭐
13,100
Pythonic HTML Parsing for Humans™
Newspaper
⭐
12,678
News, full-text, and article metadata extraction in Python 3. Advanced docs:
Chinese Xinhua
⭐
9,789
📙 中华新华字典数据库。包括歇后语,成语,词语,汉字。
Portia
⭐
8,781
Visual scraping for Scrapy
Undetected Chromedriver
⭐
5,241
Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)
Autoscraper
⭐
5,159
A Smart, Automatic, Fast and Lightweight Web Scraper for Python
Mygptreader
⭐
4,019
A community-driven way to read and chat with AI bots - powered by chatGPT.
Python Scraping
⭐
3,908
Code samples from the book Web Scraping with Python http://shop.oreilly.com/product/0636920034391.do
Data Science
⭐
3,716
Collection of useful data science topics along with articles, videos, and code
Snscrape
⭐
3,542
A social networking service scraper in Python
Onlyfans
⭐
3,351
Scrape all the media from an OnlyFans account - Updated regularly
Fake Useragent
⭐
3,041
Up-to-date simple useragent faker with real world database
Python
⭐
2,978
Python Books && Courses
Automatic Udemy Course Enroller Get Paid Udemy Courses For Free
⭐
2,892
Do you want to LEARN NEW STUFF for FREE? Don't worry, with the power of web-scraping and automation, this script will find the necessary Udemy coupons & enroll you for PAID UDEMY COURSES, ABSOLUTELY FREE!
Douyin_tiktok_download_api
⭐
2,816
🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音|TikT
Tweets_analyzer
⭐
2,573
Tweets metadata scraper & activity analyzer
Googlescraper
⭐
2,495
A Python module to scrape several search engines (like Google, Yandex, Bing, Duckduckgo, ...). Including asynchronous networking support.
Weibo_terminater
⭐
2,265
Final Weibo Crawler Scrap Anything From Weibo, comments, weibo contents, followers, anything. The Terminator
Grab
⭐
2,252
Web Scraping Framework
Facebook Page Post Scraper
⭐
2,014
Data scraper for Facebook Pages, and also code accompanying the blog post How to Scrape Data From Facebook Page Posts for Statistical Analysis
Snoop
⭐
1,866
Snoop — инструмент разведки на основе открытых данных (OSINT world)
Twitterscraper
⭐
1,852
Scrape Twitter for Tweets
Bulk Downloader For Reddit
⭐
1,822
Downloads and archives content from reddit
Facebook Scraper
⭐
1,739
Scrape Facebook public pages without an API key
Jobfunnel
⭐
1,533
Scrape job websites into a single spreadsheet with no duplicates.
Jd Autobuy
⭐
1,263
Python爬虫,京东自动登录,在线抢购商品
Cloudproxy
⭐
1,235
Hide your scrapers IP behind the cloud. Provision proxy servers across different cloud providers to improve your scraping success.
Recipe Scrapers
⭐
1,198
Python package for scraping recipes data
Gdom
⭐
1,180
DOM Traversing and Scraping using GraphQL
Informer
⭐
1,141
A Telegram Mass Surveillance Bot in Python
Linkedin_scraper
⭐
1,111
A library that scrapes Linkedin for user data
Cinemagoer
⭐
1,111
Cinemagoer is a Python package useful to retrieve and manage the data of the IMDb (to which we are not affiliated in any way) movie database about movies, people, characters and companies
Event Collect
⭐
1,110
event website listing to Open Event format scraper and converter
Django Dynamic Scraper
⭐
1,069
Creating Scrapy scrapers via the Django admin interface
Parliament Scraper
⭐
1,049
Public Data Scraper for Parliament Data for the EU and other Parliaments
Trafilatura
⭐
1,032
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
Scanless
⭐
1,021
online port scan scraper
Scrapy Cluster
⭐
1,016
This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.
Redditdownloader
⭐
989
Scrapes Reddit to download media of your choice.
Shot Scraper
⭐
984
A command-line utility for taking automated screenshots of websites
Crawler User Agents
⭐
956
Syntactic patterns of HTTP user-agents used by bots / robots / crawlers / scrapers / spiders. pull-request welcome ⭐️
Osi.ig
⭐
944
Information Gathering Instagram.
Mlscraper
⭐
935
🤖 Scrape data from HTML websites automatically by just providing examples
Instagram Crawler
⭐
922
Get Instagram posts/profile/hashtag data without using Instagram API
Animdl
⭐
922
A highly efficient, fast, powerful and light-weight anime downloader and streamer for your favorite anime.
Parsel
⭐
922
Parsel lets you extract data from XML/HTML documents using XPath or CSS selectors
Finviz
⭐
885
Unofficial API for finviz.com
Querido Diario
⭐
860
📰 Brazilian government gazettes, accessible to everyone.
Clean Text
⭐
810
🧹 Python package for text cleaning
Ultimascraper
⭐
798
Scrape content from OnlyFans and Fansly
Amazon Scraper Python
⭐
766
Non-official client to get some info about products sold on Amazon
Lulu
⭐
752
[Unmaintained] A simple and clean video/music/image downloader 👾
Scweet
⭐
720
A simple and unlimited twitter scraper : scrape tweets, likes, retweets, following, followers, user info, images...
Domain
⭐
713
Setup script for Regon-ng
Edu Mail Generator
⭐
707
Generate Free Edu Mail(s) within minutes
Scrapyrt
⭐
701
HTTP API for Scrapy spiders
Pdfquery
⭐
693
A fast and friendly PDF scraping library.
Dataengineeringproject
⭐
644
Example end to end data engineering project.
Loconotion
⭐
634
📄 Python tool to turn Notion.so pages into lightweight, customizable static websites
Easy Scraping Tutorial
⭐
618
Simple but useful Python web scraping tutorial code.
Kuwala
⭐
610
Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data science models and products with a focus on geospatial data. Currently, the following data connectors are available worldwide: a) High-resolution demograp
Urs
⭐
604
Universal Reddit Scraper - A comprehensive Reddit scraping command-line tool.
Imagescraper
⭐
572
✂️ High performance, multi-threaded image scraper
Bot
⭐
570
Completely free and open-source human-like Instagram bot. Powered by UIAutomator2 and compatible with basically any Android device 5.0+ that can run Instagram - real or emulated.
Instascrape
⭐
554
Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically
Lookyloo
⭐
553
Lookyloo is a web interface that allows users to capture a website page and then display a tree of domains that call each other.
Gazpacho
⭐
543
🥫 The simple, fast, and modern web scraping library
Google Play Scraper
⭐
543
Google play scraper for Python inspired by <facundoolano/google-play-scraper>
Bookcorpus
⭐
536
Crawl BookCorpus
Dryscrape
⭐
523
[not actively maintained] A lightweight Python library that uses Webkit to enable easy scraping of dynamic, Javascript-heavy web pages
Fansly Downloader
⭐
519
Executable Downloader App - a absolute must-have for Fansly enthusiasts. With this easy-to-use content downloading tool, you can download all your favorite content from fansly.com. No more manual downloads, enjoy your Fansly content offline anytime, anywhere! Fully customizable to download photos, videos, messages, collection & single posts 🔥
Social Media Profiles Regexs
⭐
508
📇 Extract social media profiles and more with regular expressions
Linkedin
⭐
498
Linkedin Scraper using Selenium Web Driver, Chromium headless, Docker and Scrapy
Goop
⭐
488
Google Search Scraper
Jekyll
⭐
486
Jekyll-based static site for The Programming Historian
Comic Dl
⭐
480
Comic-dl is a command line tool to download manga and comics from various comic and manga sites. Supported sites : readcomiconline.to, mangafox.me, comic naver and many more.
Twitter_scraping
⭐
479
Grab all a user's tweets (and get past 3200 limit)
Episode Rename
⭐
473
电视剧/番剧自动化重命名工具, 一键批量改名. QBittorrent下载后自动重命名, 方便Emby自动刮削. 支持Windows, Linux, MacOS, Docker 和 群晖套件环境运行
Openwebtext
⭐
463
Open clone of OpenAI's unreleased WebText dataset scraper. This version uses pushshift.io files instead of the API for speed.
Spidermon
⭐
458
Scrapy Extension for monitoring spiders execution.
Scrapple
⭐
452
A framework for creating semi-automatic web content extractors
Awesome Scrapy
⭐
450
A curated list of awesome packages, articles, and other cool resources from the Scrapy community.
Covid_19
⭐
432
COVID19 case numbers of Cantons of Switzerland and Principality of Liechtenstein (FL). The data is updated at best once a day (times of collection and update may vary). Start with the README.
Cryptocmd
⭐
427
Cryptocurrency historical price data library in Python. Data from https://coinmarketcap.com.
Newsdiffs
⭐
418
Automatic scraper that tracks changes in news articles over time.
Complete Life Cycle Of A Data Science Project
⭐
417
Complete-Life-Cycle-of-a-Data-Science-Project
Fbcrawl
⭐
415
A Facebook crawler
Alltheplaces
⭐
395
A set of spiders and scrapers to extract location information from places that post their location on the internet.
Search Engine Parser
⭐
393
Lightweight package to query popular search engines and scrape for result titles, links and descriptions
Scavenger
⭐
384
Crawler (Bot) searching for credential leaks on paste sites.
Pywebcopy
⭐
373
Locally saves webpages to your hard disk with images, css, js & links as is.
Advanced Web Scraping Tutorial
⭐
370
The Zipru scraper developed in the Advanced Web Scraping Tutorial.
Dude
⭐
363
dude uncomplicated data extraction: A simple framework for writing web scrapers using Python decorators
Tiktoklive
⭐
363
Python library to receive live stream events (comments, gifts, etc.) in realtime from TikTok LIVE.
Opensanctions
⭐
363
An open database of international sanctions data, persons of interest and politically exposed persons
Tinderbotz
⭐
356
Automated Tinder bot and scraper using selenium in python.
Telegram Members Adder
⭐
355
Telegram Members Adding Software/Script Using Termux.
Scrape Linkedin Selenium
⭐
353
`scrape_linkedin` is a python package that allows you to scrape personal LinkedIn profiles & company pages - turning the data into structured json.
Pdf.tocgen
⭐
345
A CLI toolset to generate table of contents for PDF files automatically.
Related Searches
Python Django (26,307)
Python Machine Learning (20,195)
Python Flask (15,230)
Python Dataset (14,792)
Python Pytorch (14,667)
Python Docker (13,757)
Python Tensorflow (13,736)
Python Command Line (13,213)
Python Deep Learning (13,092)
Python Jupyter Notebook (12,976)
1-100 of 3,122 search results
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2023 Awesome Open Source. All rights reserved.