Awesome Open Source

Programming Languages

Search results for web crawling

86 search results found

Crawlee ⭐ 12,402

Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.

Heritrix3 ⭐ 2,579

Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.

Scrapyrt ⭐ 793

HTTP API for Scrapy spiders

Listed Company News Crawl And Text Analysis ⭐ 689

从新浪财经、每经网、金融界、中国证券网、证券时报网上，爬取上市公司（个股）的历史新闻文本数据进行文本

Opensearchserver ⭐ 419

Open-source Enterprise Grade Search Engine Software

Botasaurus ⭐ 331

The All in One Web Scraping Framework

Crawler ⭐ 285

Library for Rapid (Web) Crawler and Scraper Development

Infinitycrawler ⭐ 221

A simple but powerful web crawler library for .NET

Amazon Scraper ⭐ 219

A simple web scraper to extract Product Data and Pricing from Amazon

Ayakashi ⭐ 177

⚡ Ayakashi.io - The next generation web scraping framework

Bet On Sibyl ⭐ 157

Machine Learning Model for Sport Predictions (Football, Basketball, Baseball, Hockey, Soccer & Tennis)

This program provides efficient web scraping services for Tor and non-Tor sites. The program has both a CLI and REST API.

ralger makes it easy to scrape a website. Built on the shoulders of titans: rvest, xml2.

Scrapy Training ⭐ 141

Scrapy Training companion code

Raspagem De Dados Para Iniciantes ⭐ 115

Raspagem de dados para iniciante usando Scrapy e outras libs básicas

Bancocentralbrasil ⭐ 112

💵 💰 🇧🇷 Informações sobre taxas oficiais diárias de Inflação, Selic, Poupança, Dólar, Dólar PTAX, Euro e Euro PTAX pelo site do Banco Central do Brasil

Seleniumcrawler ⭐ 105

An example using Selenium webdrivers for python and Scrapy framework to create a web scraper to crawl an ASP site

A web crawling framework written in Kotlin

Terpene Profile Parser For Cannabis Strains ⭐ 93

Parser and database to index the terpene profile of different strains of Cannabis from online databases

Scrapyd Cluster On Heroku ⭐ 90

Set up free and scalable Scrapyd cluster for distributed web-crawling with just a few clicks. DEMO 👉

Malheatmap ⭐ 87

An extension for tracking your activities on myanimelist.net

Katastrophe ⭐ 86

Command Line Tool to download torrents

Bancocentralbrasil ⭐ 71

💵 💰 🇧🇷 Informações sobre taxas oficiais diárias de Inflação, Selic, Poupança, Dólar, Dólar PTAX, Euro e Euro PTAX pelo site do Banco Central do Brasil

Robots.txt ⭐ 69

Simple robots.txt template. Keep unwanted robots out (disallow). White lists (allow) legitimate user-agents. Useful for all websites.

ARGUS is an easy-to-use web scraping tool. The program is based on the Scrapy Python framework and is able to crawl a broad range of different websites. On the websites, ARGUS is able to perform tasks like scraping texts or collecting hyperlinks between websites. See: https://link.springer.com/article/10.1007/s11192-0

Daenerys ⭐ 65

Scraping and Web Crawling Framework For Zhihu Live

Amazon_scraper ⭐ 64

Amazon products scraper with using of rotating proxies and headless Chrome from ScrapingAnt

Dotnetcrawler ⭐ 63

DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-w

Newspaperjs ⭐ 63

News extraction and scraping. Article Parsing

Clauneck ⭐ 57

A tool for scraping emails, social media accounts, and much more information from websites using Google Search Results.

Clean, filter and sample URLs to optimize data collection – includes spam, content type and language filters

JAW: A Graph-based Security Analysis Framework for Client-side JavaScript

Pythonframeworks ⭐ 49

Another curated list of Python frameworks

Scrapy Craigslist ⭐ 47

Web Scraping Craigslist's Engineering Jobs in NY with Scrapy

Proxy_web_crawler ⭐ 39

Automates the process of repeatedly searching for a website via scraped proxy IP and search keywords

Fifa Fut Data ⭐ 39

Web-scraping script that writes the data of all players from FutHead and FutBin to a CSV file or a DB

Amazon Flipkart Price Comparison Engine ⭐ 36

Compares price of the product entered by the user from e-commerce sites Amazon and Flipkart 💰 📊

Flink Crawler ⭐ 35

Continuous scalable web crawler built on top of Flink and crawler-commons

Url Frontier ⭐ 34

API definition, resources and reference implementation of URL Frontiers

Omnisci3nt ⭐ 34

Unveiling the Hidden Layers of the Web – A Comprehensive Web Reconnaissance Tool

Tibia.py ⭐ 32

API to parse tibia.com content into python objects.

Spidyquotes ⭐ 30

Example site for web scraping tutorials

Web Scraping Framework

Webtranspose ⭐ 27

Web scraping API for building AI applications.

Tweetsolaping ⭐ 24

implementing an end-to-end tweets ETL/Analysis pipeline.

Knowledgegraph ⭐ 22

This repository for Web Crawling, Information Extraction, and Knowledge Graph build up.

An open source web crawling platform

Udacity Data Analyst Nanodegree ⭐ 19

Amazon Mobile Sentiment Analysis ⭐ 18

Opinion mining of Mobile reviews on Amazon platform

Crawlerx ⭐ 16

CrawlerX - Develop Extensible, Distributed, Scalable Crawler System which is a web platform that can be used to crawl URLs in different kind of protocols in a distributed way.

Stock Fundamental Data Scraping And Analysis ⭐ 14

Project on building a web crawler to collect the fundamentals of the stock and review their performance in one go

Selenium Twitter Scraper ⭐ 14

This is a Twitter Scraper which uses Selenium for scraping tweets. It is capable of scraping tweets from home, user profile, hashtag, query or search, and advanced searches.

Dynamic Web Crawlering Python ⭐ 14

This repo is mainly for dynamic web (Ajax Tech) crawling using Python, taking China's NSTL websites as an example.

A lightweight crawling/spider framework for everyone(support JavaScript!).✨

Olx_scraper ⭐ 13

📻 An OLX Scraper using Scrapy + MongoDB. It Scrapes recent ads posted regarding requested product and dumps to NOSQL MONGODB.

Microwler ⭐ 12

A micro-framework for asynchronous deep crawls and web scraping with Python

Webhunterscreen ⭐ 12

This program aims to check active targets by saving screenshots in a project.

Scrawler ⭐ 11

Scala web crawling and scraping using fs2 streams

Deep_learning ⭐ 11

projects about NLP knowledge graph, web crawling, word embedding, entity&relation extraction.

Alibaba_scraper ⭐ 10

Alibaba scraper with using of rotating proxies and headless Chrome from ScrapingAnt

Scrapyteer ⭐ 9

Web crawling & scraping framework for Node.js on top of headless Chrome browser

Frontera_example ⭐ 9

Example frontera project

Amazon Captcha Solver ⭐ 9

A TensorFlow (Deep Learning - CNN) based solution for tackling captcha when collecting data from Amazon.

Dataanalysis_bootcamp_crawler ⭐ 8

Web scraper implementations for a variety of websites.

Autoproxy ⭐ 8

Public proxy farm that automatically records and queues suitable proxy servers for web crawling

Dotnetexpose ⭐ 8

A package that helps you to scrap web pages. It shows you a lot of information about the page.

Golang Web Scraping ⭐ 8

Learn how to scrape web content from HTML and see how web scraping differs to web crawling

Open Collaborative AI Driven Parser builder for Web Scraping, Data Extraction and Crawling,Knowledge Graph

Teanaps Web Scraper ⭐ 8

텍스트 분석용 데이터 수집을 위한 웹스크래핑 도구를 제공합니다.

Socials_regex ⭐ 8

🪡 Social account detection and extraction in ruby, e.g. for crawling/scraping.

Best Games Of All Time Data Based ⭐ 7

🏆 Definite Best Games Of All Time Data Based by multiple sources

Botasaurus Starter ⭐ 7

🚀 OFFICIAL STARTER TEMPLATE FOR BOTASAURUS SCRAPING FRAMEWORK 🤖

Born2crawl ⭐ 7

A highly performant and versatile crawling engine, designed with scalability and extensibility in mind.

GenBank Record downloader for taxonomists

Web Search Engine Uic ⭐ 6

CS 582 Information Retrieval at University of Illinois at Chicago. Multithreaded crawling of UIC domain, inverted index, page rank, SEO with Context Pseudo-Relevance Feedback

（更新）数据接口，淘宝(带精确预售量、精确月销量)，拼多多，小红书，微信公众号，大众点评，快手，京东

Data Mining 51job ⭐ 6

Data-mining on 51Job website

Common_crawl_corpus ⭐ 6

Scripts for building a geo-located web corpus using Common Crawl data

Search Engine ⭐ 6

Application made with Node.js and Python.

Robots Txt ⭐ 6

Robots Exclusion Standard/Protocol Parser for Web Crawling/Scraping

Web Crawler ⭐ 6

A Web Crawler developed in Python.

Zoominfo_scraper ⭐ 6

Zoominfo scraper with using of rotating proxies and headless Chrome from ScrapingAnt

Spiderboi ⭐ 5

A web crawling library written in TypeScript.

Web Crawler ⭐ 5

Web Crawler with Python

Jupyter Notebook을 활용한 Time-series data 분석 및 crawling 기술, D3를 이용한 시각화 기술 구현 및 연구

Scrapes attendance and marks related data from AURIS (Ahmedabad University Resource Information System) and notifies the user without him having to check his data repeatedly

1-86 of 86 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.