Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Crawlee | 11,736 | 42 | 10 days ago | 747 | December 10, 2023 | 96 | apache-2.0 | TypeScript | ||
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation. | ||||||||||
Headless Chrome Crawler | 5,051 | 10 | 12 | 3 years ago | 21 | June 11, 2018 | 28 | mit | JavaScript | |
Distributed crawler powered by Headless Chrome | ||||||||||
Browser Fingerprinting | 3,353 | a year ago | 7 | JavaScript | ||||||
Analysis of Bot Protection systems with available countermeasures 🚿. How to defeat anti-bot system 👻 and get around browser fingerprinting scripts 🕵️♂️ when scraping the web? | ||||||||||
Puppeteer Sharp | 3,087 | 27 | 120 | 14 days ago | 78 | December 05, 2023 | 120 | mit | C# | |
Headless Chrome .NET API | ||||||||||
Awesome Puppeteer | 2,245 | 3 months ago | 19 | |||||||
A curated list of awesome puppeteer resources. | ||||||||||
Rendora | 1,950 | a year ago | 1 | January 04, 2019 | 28 | apache-2.0 | Go | |||
dynamic server-side rendering using headless Chrome to effortlessly solve the SEO problem for modern javascript websites | ||||||||||
X Crawl | 718 | 1 | 2 months ago | 59 | November 09, 2023 | 5 | mit | TypeScript | ||
x-crawl is a flexible Node.js multifunctional crawler library. Flexible usage and numerous functions can help you quickly, safely, and stably crawl pages, interfaces, and files. ---------------- x-crawl 是一个灵活的 Node.js 多功能爬虫库。灵活的使用方式和众多的功能可以帮助您快速、安全、稳定地爬取页面、接口以及文件。 | ||||||||||
Jvppeteer | 549 | 9 months ago | 15 | October 30, 2021 | 67 | apache-2.0 | Java | |||
Headless Chrome For Java (Java 爬虫) | ||||||||||
Browsertrix Crawler | 470 | 2 months ago | 91 | agpl-3.0 | JavaScript | |||||
Run a high-fidelity browser-based crawler in a single Docker container | ||||||||||
Webster | 465 | 2 | 5 months ago | 42 | November 09, 2023 | 1 | gpl-3.0 | JavaScript | ||
a reliable high-level web crawling & scraping framework for Node.js. |