Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Trafilatura | 2,447 | 66 | 3 months ago | 39 | November 29, 2023 | 66 | gpl-3.0 | Python | ||
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments | ||||||||||
Bing Ip2hosts | 91 | 3 years ago | gpl-3.0 | Shell | ||||||
bingip2hosts is a Bing.com web scraper that discovers websites by IP address | ||||||||||
Tiktok Trending Data | 60 | 3 months ago | 1 | |||||||
Scraping the TikTok discovery web API every 15 minutes using Github Actions to view changes | ||||||||||
Crawlkit | 23 | 6 | 5 | 7 years ago | 34 | May 23, 2016 | 1 | mit | JavaScript | |
A crawler based on Phantom. Allows discovery of dynamic content and supports custom scrapers. | ||||||||||
Pukpuk | 9 | 2 years ago | 18 | August 05, 2022 | mit | Python | ||||
HTTP discovery and change monitoring tool | ||||||||||
Scraper | 6 | 6 years ago | mit | Go | ||||||
Example of using prometheus discovery and scraping library | ||||||||||
Torrentd | 5 | a year ago | mit | Go | ||||||
Torrent discovery and tracking server | ||||||||||
Locust | 5 | 2 | 4 years ago | 4 | June 09, 2020 | mit | JavaScript | |||
Distributed web data discovery and collection framework built for serverless |