Awesome Open Source
Search results for python scraping
3,132 search results found
Scrapy, a fast high-level web crawling & scraping framework for Python.
Pythonic HTML Parsing for Humans™
News, full-text, and article metadata extraction in Python 3. Advanced docs:
Visual scraping for Scrapy
Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)
A Smart, Automatic, Fast and Lightweight Web Scraper for Python
A community-driven way to read and chat with AI bots - powered by chatGPT.
Code samples from the book Web Scraping with Python http://shop.oreilly.com/product/0636920034391.do
Collection of useful data science topics along with articles, videos, and code
A social networking service scraper in Python
Scrape all the media from an OnlyFans account - Updated regularly
Up-to-date simple useragent faker with real world database
Python Books && Courses
Automatic Udemy Course Enroller Get Paid Udemy Courses For Free
Do you want to LEARN NEW STUFF for FREE? Don't worry, with the power of web-scraping and automation, this script will find the necessary Udemy coupons & enroll you for PAID UDEMY COURSES, ABSOLUTELY FREE!
Tweets metadata scraper & activity analyzer
A Python module to scrape several search engines (like Google, Yandex, Bing, Duckduckgo, ...). Including asynchronous networking support.
Final Weibo Crawler Scrap Anything From Weibo, comments, weibo contents, followers, anything. The Terminator
Web Scraping Framework
Facebook Page Post Scraper
Data scraper for Facebook Pages, and also code accompanying the blog post How to Scrape Data From Facebook Page Posts for Statistical Analysis
Snoop — инструмент разведки на основе открытых данных (OSINT world)
Scrape Twitter for Tweets
Bulk Downloader For Reddit
Downloads and archives content from reddit
Scrape Facebook public pages without an API key
Scrape job websites into a single spreadsheet with no duplicates.
Hide your scrapers IP behind the cloud. Provision proxy servers across different cloud providers to improve your scraping success.
Python package for scraping recipes data
DOM Traversing and Scraping using GraphQL
A library that scrapes Linkedin for user data
A Telegram Mass Surveillance Bot in Python
Cinemagoer is a Python package useful to retrieve and manage the data of the IMDb (to which we are not affiliated in any way) movie database about movies, people, characters and companies
event website listing to Open Event format scraper and converter
Django Dynamic Scraper
Creating Scrapy scrapers via the Django admin interface
Public Data Scraper for Parliament Data for the EU and other Parliaments
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
online port scan scraper
This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.
Scrapes Reddit to download media of your choice.
A command-line utility for taking automated screenshots of websites
Crawler User Agents
Syntactic patterns of HTTP user-agents used by bots / robots / crawlers / scrapers / spiders. pull-request welcome ⭐️
Information Gathering Instagram.
🤖 Scrape data from HTML websites automatically by just providing examples
A highly efficient, fast, powerful and light-weight anime downloader and streamer for your favorite anime.
Get Instagram posts/profile/hashtag data without using Instagram API
Parsel lets you extract data from XML/HTML documents using XPath or CSS selectors
Unofficial API for finviz.com
📰 Brazilian government gazettes, accessible to everyone.
🧹 Python package for text cleaning
Scrape content from OnlyFans and Fansly
Amazon Scraper Python
Non-official client to get some info about products sold on Amazon
[Unmaintained] A simple and clean video/music/image downloader 👾
A simple and unlimited twitter scraper : scrape tweets, likes, retweets, following, followers, user info, images...
Setup script for Regon-ng
Edu Mail Generator
Generate Free Edu Mail(s) within minutes
HTTP API for Scrapy spiders
A fast and friendly PDF scraping library.
Example end to end data engineering project.
📄 Python tool to turn Notion.so pages into lightweight, customizable static websites
Easy Scraping Tutorial
Simple but useful Python web scraping tutorial code.
Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data science models and products with a focus on geospatial data. Currently, the following data connectors are available worldwide: a) High-resolution demograp
Universal Reddit Scraper - A comprehensive Reddit scraping command-line tool.
✂️ High performance, multi-threaded image scraper
Completely free and open-source human-like Instagram bot. Powered by UIAutomator2 and compatible with basically any Android device 5.0+ that can run Instagram - real or emulated.
Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically
Lookyloo is a web interface that allows users to capture a website page and then display a tree of domains that call each other.
Google Play Scraper
Google play scraper for Python inspired by <facundoolano/google-play-scraper>
🥫 The simple, fast, and modern web scraping library
Executable Downloader App - a absolute must-have for Fansly enthusiasts. With this easy-to-use content downloading tool, you can download all your favorite content from fansly.com. No more manual downloads, enjoy your Fansly content offline anytime, anywhere! Fully customizable to download photos, videos, messages, collection & single posts 🔥
Social Media Profiles Regexs
📇 Extract social media profiles and more with regular expressions
Linkedin Scraper using Selenium Web Driver, Chromium headless, Docker and Scrapy
Google Search Scraper
Jekyll-based static site for The Programming Historian
Comic-dl is a command line tool to download manga and comics from various comic and manga sites. Supported sites : readcomiconline.to, mangafox.me, comic naver and many more.
Grab all a user's tweets (and get past 3200 limit)
电视剧/番剧自动化重命名工具, 一键批量改名. QBittorrent下载后自动重命名, 方便Emby自动刮削. 支持Windows, Linux, MacOS, Docker 和 群晖套件环境运行
Open clone of OpenAI's unreleased WebText dataset scraper. This version uses pushshift.io files instead of the API for speed.
Scrapy Extension for monitoring spiders execution.
A framework for creating semi-automatic web content extractors
A curated list of awesome packages, articles, and other cool resources from the Scrapy community.
COVID19 case numbers of Cantons of Switzerland and Principality of Liechtenstein (FL). The data is updated at best once a day (times of collection and update may vary). Start with the README.
Cryptocurrency historical price data library in Python. Data from https://coinmarketcap.com.
Automatic scraper that tracks changes in news articles over time.
Complete Life Cycle Of A Data Science Project
A Facebook crawler
A set of spiders and scrapers to extract location information from places that post their location on the internet.
Search Engine Parser
Lightweight package to query popular search engines and scrape for result titles, links and descriptions
Crawler (Bot) searching for credential leaks on paste sites.
Locally saves webpages to your hard disk with images, css, js & links as is.
Advanced Web Scraping Tutorial
The Zipru scraper developed in the Advanced Web Scraping Tutorial.
Telegram Members Adder
Telegram Members Adding Software/Script Using Termux.
An open database of international sanctions data, persons of interest and politically exposed persons
dude uncomplicated data extraction: A simple framework for writing web scrapers using Python decorators
Python library to receive live stream events (comments, gifts, etc.) in realtime from TikTok LIVE.
Automated Tinder bot and scraper using selenium in python.
Scrape Linkedin Selenium
`scrape_linkedin` is a python package that allows you to scrape personal LinkedIn profiles & company pages - turning the data into structured json.
A CLI toolset to generate table of contents for PDF files automatically.
Python Django (26,307)
Python Python3 (22,971)
Python Ml (20,195)
Python Flask (15,230)
Python Dataset (14,792)
Python Pytorch (14,667)
Python Machine Learning (14,099)
Python Docker (13,757)
Python Tensorflow (13,736)
Python Cli (13,187)
1-100 of 3,132 search results
Follow Us On Twitter
Copyright 2018-2023 Awesome Open Source. All rights reserved.