Archivespark

An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed at Internet Archive.
Alternatives To Archivespark
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
Aut128
9 months ago27November 17, 20223apache-2.0Scala
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Archivespark118
3 years ago7September 16, 20194mitScala
An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed at Internet Archive.
Sradbv219
5 years ago12R
R Interface to the NCBI SRA metadata
Notebooks18
a year agoapache-2.0Jupyter Notebook
Various examples of notebooks for working with web archives with the Archives Unleashed Toolkit, and derivatives generated by the Archives Unleashed Toolkit.
Nspark12
2 years ago2C
Nspark dearchiver for RISC OS archives
Docker Aut11
a year agootherDockerfile
Docker image for the Archives Unleashed Toolkit
Twut7
2 years ago1December 10, 20191apache-2.0Scala
An open-source toolkit for analyzing line-oriented JSON Twitter archives with Apache Spark.
Alternatives To Archivespark
Select To Compare


Alternative Project Comparisons
Popular Archive Projects
Popular Spark Projects
Popular Data Storage Categories
Related Searches

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Scala
Processing
Spark
Archive
Web Archiving