Awesome Open Source

Programming Languages

Search results for python scraper

1,373 search results found

Scrapy ⭐ 49,918

Scrapy, a fast high-level web crawling & scraping framework for Python.

Newspaper ⭐ 13,147

News, full-text, and article metadata extraction in Python 3. Advanced docs:

Requests Html ⭐ 13,100

Pythonic HTML Parsing for Humans™

Chinese Xinhua ⭐ 10,425

📙 中华新华字典数据库。包括歇后语，成语，词语，汉字。

Portia ⭐ 8,982

Visual scraping for Scrapy

Undetected Chromedriver ⭐ 7,232

Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)

Autoscraper ⭐ 5,159

A Smart, Automatic, Fast and Lightweight Web Scraper for Python

Douyin_tiktok_download_api ⭐ 4,844

🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音、快手、T

Mygptreader ⭐ 4,267

A community-driven way to read and chat with AI bots - powered by chatGPT.

Python Scraping ⭐ 4,136

Code samples from the book Web Scraping with Python http://shop.oreilly.com/product/0636920034391.do

Snscrape ⭐ 3,992

A social networking service scraper in Python

Data Science ⭐ 3,898

Collection of useful data science topics along with articles, videos, and code

Fake Useragent ⭐ 3,356

Up-to-date simple useragent faker with real world database

Automatic Udemy Course Enroller Get Paid Udemy Courses For Free ⭐ 3,010

Do you want to LEARN NEW STUFF for FREE? Don't worry, with the power of web-scraping and automation, this script will find the necessary Udemy coupons & enroll you for PAID UDEMY COURSES, ABSOLUTELY FREE!

Python ⭐ 2,978

Python Books && Courses

Tweets_analyzer ⭐ 2,894

Tweets metadata scraper & activity analyzer

Googlescraper ⭐ 2,540

A Python module to scrape several search engines (like Google, Yandex, Bing, Duckduckgo, ...). Including asynchronous networking support.

Snoop ⭐ 2,530

Snoop — инструмент разведки на основе открытых данных (OSINT world)

Trafilatura ⭐ 2,447

Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments

Web Scraping Framework

Weibo_terminater ⭐ 2,265

Final Weibo Crawler Scrap Anything From Weibo, comments, weibo contents, followers, anything. The Terminator

Bulk Downloader For Reddit ⭐ 2,142

Downloads and archives content from reddit

Facebook Page Post Scraper ⭐ 2,014

Data scraper for Facebook Pages, and also code accompanying the blog post How to Scrape Data From Facebook Page Posts for Statistical Analysis

Facebook Scraper ⭐ 1,936

Scrape Facebook public pages without an API key

Twitterscraper ⭐ 1,852

Scrape Twitter for Tweets

Linkedin_scraper ⭐ 1,534

A library that scrapes Linkedin for user data

Jobfunnel ⭐ 1,533

Scrape job websites into a single spreadsheet with no duplicates.

Recipe Scrapers ⭐ 1,408

Python package for scraping recipes data

Jd Autobuy ⭐ 1,291

Python爬虫，京东自动登录，在线抢购商品

Shot Scraper ⭐ 1,285

A command-line utility for taking automated screenshots of websites

Cloudproxy ⭐ 1,235

Hide your scrapers IP behind the cloud. Provision proxy servers across different cloud providers to improve your scraping success.

DOM Traversing and Scraping using GraphQL

Cinemagoer ⭐ 1,156

Cinemagoer is a Python package useful to retrieve and manage the data of the IMDb (to which we are not affiliated in any way) movie database about movies, people, characters and companies

Informer ⭐ 1,141

A Telegram Mass Surveillance Bot in Python

Scrapy Cluster ⭐ 1,137

This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.

Event Collect ⭐ 1,110

event website listing to Open Event format scraper and converter

Animdl ⭐ 1,105

A highly efficient, fast, powerful and light-weight anime downloader and streamer for your favorite anime.

Fansly Downloader ⭐ 1,103

Easy to use fansly.com content downloading tool. Written in python, but ships as a standalone Executable App for Windows too. Enjoy your Fansly content offline anytime, anywhere in the highest possible content resolution! Fully customizable to download in bulk or single: photos, videos & audio from timeline, messages, collection & specific posts 👍

Django Dynamic Scraper ⭐ 1,069

Creating Scrapy scrapers via the Django admin interface

Scanless ⭐ 1,061

online port scan scraper

Parliament Scraper ⭐ 1,049

Public Data Scraper for Parliament Data for the EU and other Parliaments

Redditdownloader ⭐ 1,045

Scrapes Reddit to download media of your choice.

Crawler User Agents ⭐ 1,045

Syntactic patterns of HTTP user-agents used by bots / robots / crawlers / scrapers / spiders. pull-request welcome ⭐

Osi.ig ⭐ 1,027

Information Gathering Instagram.

Parsel ⭐ 1,010

Parsel lets you extract data from XML/HTML documents using XPath or CSS selectors

Unofficial API for finviz.com

Querido Diario ⭐ 944

📰 Diários oficiais brasileiros acessíveis a todos | 📰 Brazilian government gazettes, accessible to everyone.

Mlscraper ⭐ 935

🤖 Scrape data from HTML websites automatically by just providing examples

Instagram Crawler ⭐ 922

Get Instagram posts/profile/hashtag data without using Instagram API

Clean Text ⭐ 810

🧹 Python package for text cleaning

Scrapyrt ⭐ 793

HTTP API for Scrapy spiders

Completely free and open-source human-like Instagram bot. Powered by UIAutomator2 and compatible with basically any Android device 5.0+ that can run Instagram - real or emulated.

Loconotion ⭐ 775

📄 Python tool to turn Notion.so pages into lightweight, customizable static websites

Amazon Scraper Python ⭐ 766

Non-official client to get some info about products sold on Amazon

[Unmaintained] A simple and clean video/music/image downloader 👾

A simple and unlimited twitter scraper : scrape tweets, likes, retweets, following, followers, user info, images...

Gazpacho ⭐ 716

🥫 The simple, fast, and modern web scraping library

Setup script for Regon-ng

Edu Mail Generator ⭐ 707

Generate Free Edu Mail(s) within minutes

Episode Rename ⭐ 699

电视剧/番剧自动化重命名工具, 一键批量改名. 可配合QBittorrent下载后自动重命名, 方便Emby自动刮削. 支持Windows, Linux, MacOS, Docker 和群晖套件环境运行

Bookcorpus ⭐ 698

Crawl BookCorpus

Pdfquery ⭐ 693

A fast and friendly PDF scraping library.

Google Play Scraper ⭐ 645

Google play scraper for Python inspired by <facundoolano/google-play-scraper>

Dataengineeringproject ⭐ 644

Example end to end data engineering project.

Tiktoklive ⭐ 623

Python library to receive live stream events (comments, gifts, etc.) in realtime from TikTok LIVE.

Easy Scraping Tutorial ⭐ 618

Simple but useful Python web scraping tutorial code.

Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data science models and products with a focus on geospatial data. Currently, the following data connectors are available worldwide: a) High-resolution demograp

Universal Reddit Scraper - A comprehensive Reddit scraping command-line tool.

Linkedin ⭐ 602

Linkedin Scraper using Selenium Web Driver, Chromium headless, Docker and Scrapy

Imagescraper ⭐ 572

✂️ High performance, multi-threaded image scraper

Instascrape ⭐ 554

Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically

Dryscrape ⭐ 523

[not actively maintained] A lightweight Python library that uses Webkit to enable easy scraping of dynamic, Javascript-heavy web pages

Social Media Profiles Regexs ⭐ 508

📇 Extract social media profiles and more with regular expressions

Alltheplaces ⭐ 502

A set of spiders and scrapers to extract location information from places that post their location on the internet.

Complete Life Cycle Of A Data Science Project ⭐ 499

Complete-Life-Cycle-of-a-Data-Science-Project

Comic Dl ⭐ 498

Comic-dl is a command line tool to download manga and comics from various comic and manga sites. Supported sites : readcomiconline.to, mangafox.me, comic naver and many more.

Jekyll-based static site for The Programming Historian

Google Search Scraper

Spidermon ⭐ 486

Scrapy Extension for monitoring spiders execution.

Twitter_scraping ⭐ 479

Grab all a user's tweets (and get past 3200 limit)

Telegram Members Adder ⭐ 475

Telegram Members Adding Software/Script Using Termux.

Cryptocmd ⭐ 472

Cryptocurrency historical price data library in Python. Data from https://coinmarketcap.com.

Tinderbotz ⭐ 468

Automated Tinder bot and scraper using selenium in python.

Openwebtext ⭐ 463

Open clone of OpenAI's unreleased WebText dataset scraper. This version uses pushshift.io files instead of the API for speed.

Pywebcopy ⭐ 455

Locally saves webpages to your hard disk with images, css, js & links as is.

Scrapple ⭐ 452

A framework for creating semi-automatic web content extractors

Awesome Scrapy ⭐ 450

A curated list of awesome packages, articles, and other cool resources from the Scrapy community.

Pdf.tocgen ⭐ 444

A CLI toolset to generate table of contents for PDF files automatically.

Auto Archiver ⭐ 439

Automatically archive links to videos, images, and social media content from Google Sheets (and more).

Movie metadata scraper

Google Search Results Python ⭐ 432

Google Search Results via SERP API pip Python Package

Covid_19 ⭐ 428

COVID19 case numbers of Cantons of Switzerland and Principality of Liechtenstein (FL). The data is updated at best once a day (times of collection and update may vary). Start with the README.

Newsdiffs ⭐ 418

Automatic scraper that tracks changes in news articles over time.

Search Engine Parser ⭐ 416

Lightweight package to query popular search engines and scrape for result titles, links and descriptions

Fbcrawl ⭐ 415

A Facebook crawler

Docker Selenium Lambda ⭐ 402

The simplest demo of chrome automation by python and selenium in AWS Lambda

dude uncomplicated data extraction: A simple framework for writing web scrapers using Python decorators

Scavenger ⭐ 384

Crawler (Bot) searching for credential leaks on paste sites.

Basketball_reference_web_scraper ⭐ 382

NBA Stats API via Basketball Reference

Search Engines Scraper ⭐ 377

Search google, bing, yahoo, and other search engines with python

Related Searches

Python Django (26,307)

Python Machine Learning (20,195)

Python Dataset (14,792)

Python Flask (14,408)

Python Docker (13,757)

Python Tensorflow (13,736)

Python Deep Learning (13,092)

Python Jupyter Notebook (12,976)

Python Network (11,646)

Python Algorithms (10,033)

1-100 of 1,373 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.