Chronicrawl

Experimental continouous web crawler for web archiving
Alternatives To Chronicrawl
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
Heritrix32,57925 months ago9July 27, 202248otherJava
Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.
Awesome Web Archiving1,669
3 months ago3cc0-1.0
An Awesome List for getting started with web archiving
Grab Site1,254
a month ago92otherPython
The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
Awesome Datahoarding892
7 months ago4
List of data-hoarding related tools
Brozzler613
23 months ago23January 02, 202040apache-2.0Python
brozzler - distributed browser-based web crawler
Archivebot328
5 months ago169mitPython
ArchiveBot, an IRC bot for archiving websites
Tumblr_crawler258
6 years ago2gpl-3.0Python
This is a Multi-thread crawler for Tumblr.
Google Group Crawler213
2 years ago6Shell
[Deprecated] Get (almost) original messages from google group archives. Your data is yours.
Cc Crawl Statistics97
4 months agoapache-2.0Python
Statistics of Common Crawl monthly archives mined from URL index files
Wget Lua72
4 months ago10gpl-3.0C
Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.
Alternatives To Chronicrawl
Select To Compare


Alternative Project Comparisons
Popular Archive Projects
Popular Crawler Projects
Popular Data Storage Categories
Related Searches

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Java
Archive
Crawler
Web Crawler
Chromium
Web Archiving