Project Name	Stars	Repos Using This	Packages Using This	Most Recent Commit	Total Releases	Latest Release	Open Issues	License	Language
Heritrix3	2,579		2	5 months ago	9	July 27, 2022	48	other	Java
Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.
Awesome Web Archiving	1,669			3 months ago			3	cc0-1.0
An Awesome List for getting started with web archiving
Grab Site	1,254			a month ago			92	other	Python
The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
Awesome Datahoarding	892			7 months ago			4
List of data-hoarding related tools
Brozzler	613	2		3 months ago	23	January 02, 2020	40	apache-2.0	Python
brozzler - distributed browser-based web crawler
Archivebot	328			5 months ago			169	mit	Python
ArchiveBot, an IRC bot for archiving websites
Tumblr_crawler	258			6 years ago			2	gpl-3.0	Python
This is a Multi-thread crawler for Tumblr.
Google Group Crawler	213			2 years ago			6		Shell
[Deprecated] Get (almost) original messages from google group archives. Your data is yours.
Cc Crawl Statistics	97			4 months ago				apache-2.0	Python
Statistics of Common Crawl monthly archives mined from URL index files
Wget Lua	72			4 months ago			10	gpl-3.0	C
Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.

Alternatives To Chronicrawl

Select To Compare

Heritrix3 ⭐ 2,579

Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.

dependent packages 2total releases 9most recent commit 5 months ago

Awesome Web Archiving ⭐ 1,669

An Awesome List for getting started with web archiving

most recent commit 3 months ago

Grab Site ⭐ 1,254

The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns

most recent commit a month ago

Awesome Datahoarding ⭐ 892

List of data-hoarding related tools

most recent commit 7 months ago

Brozzler ⭐ 613

brozzler - distributed browser-based web crawler

total releases 23most recent commit 3 months ago

Archivebot ⭐ 328

ArchiveBot, an IRC bot for archiving websites

most recent commit 5 months ago

Tumblr_crawler ⭐ 258

This is a Multi-thread crawler for Tumblr.

most recent commit 6 years ago

Google Group Crawler ⭐ 213

[Deprecated] Get (almost) original messages from google group archives. Your data is yours.

most recent commit 2 years ago

Cc Crawl Statistics ⭐ 97

Statistics of Common Crawl monthly archives mined from URL index files

most recent commit 4 months ago

Wget Lua ⭐ 72

Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.

most recent commit 4 months ago

Suggest An Alternative To chronicrawl

Alternative Project Comparisons

Chronicrawl vs Heritrix3

Chronicrawl vs Awesome Web Archiving

Chronicrawl vs Grab Site

Chronicrawl vs Awesome Datahoarding

Chronicrawl vs Brozzler

Chronicrawl vs Archivebot

Chronicrawl vs Tumblr_crawler

Chronicrawl vs Google Group Crawler

Chronicrawl vs Cc Crawl Statistics

Chronicrawl vs Wget Lua

Popular Archive Projects

Vim ⭐ 34,054

The official Vim repository

total releases 1,494latest release October 28, 2022most recent commit 3 months ago

Singlefile ⭐ 12,865

Web Extension for saving a faithful copy of a complete web page in a single HTML file

most recent commit 3 months ago

Paperless ⭐ 7,828

Scan, index, and archive all of your paper documents

most recent commit 3 years ago

Jmeter ⭐ 7,700

Apache JMeter open-source load testing tool for analyzing and measuring the performance of a variety of services

dependent packages 75total releases 27latest release July 11, 2023most recent commit 3 months ago

Manifest ⭐ 7,240

Component for reading phar.io manifest information from a PHP Archive (PHAR)

dependent packages 13total releases 8latest release July 20, 2021most recent commit a year ago

Popular Crawler Projects

Scrapy ⭐ 49,918

Scrapy, a fast high-level web crawling & scraping framework for Python.

dependent packages 445total releases 96latest release September 18, 2023most recent commit 3 months ago

Lux ⭐ 24,752

👾 Fast and simple video download library and CLI tool written in Go

dependent packages 8total releases 40latest release November 06, 2023most recent commit 17 days ago

Colly ⭐ 21,902

Elegant Scraper and Crawler Framework for Golang

dependent packages 328total releases 22latest release March 08, 2022most recent commit a month ago

Easyspider ⭐ 20,149

A visual no-code/code-free web crawler/spider易采集：一个可视化浏览器自动化测试/数据采集/爬虫软件，可以无代码图形化

most recent commit 15 days ago

Proxy_pool ⭐ 19,442

Python ProxyPool for web spider

most recent commit 3 months ago

Popular Data Storage Categories