Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Wikiteam | 661 | 4 months ago | 159 | gpl-3.0 | Python | |||||
Tools for downloading and preserving wikis. We archive wikis, from Wikipedia to tiniest wikis. As of 2023, WikiTeam has preserved more than 350,000 wikis. | ||||||||||
Wikipedia Extractor | 247 | 8 years ago | 1 | Python | ||||||
This is a mirror of the script by Giuseppe Attardi, and contains history before the official repo started: https://github.com/attardi/wikiextractor --- Extracts and cleans text from Wikipedia database dump and stores output in a number of files of similar size in a given directory. | ||||||||||
Json Wikipedia | 244 | 2 years ago | 6 | apache-2.0 | Java | |||||
Json Wikipedia, contains code to convert the Wikipedia xml dump into a json/avro dump | ||||||||||
Dumpster Dive | 214 | 1 | 2 | 8 months ago | 34 | July 04, 2023 | 8 | other | JavaScript | |
roll a wikipedia dump into mongo | ||||||||||
Go Xml Parse | 117 | 9 years ago | November 26, 2023 | 2 | Go | |||||
Streaming XML parser example in go | ||||||||||
Annotated Wikiextractor | 88 | 13 years ago | gpl-3.0 | Python | ||||||
Simple Wikipedia plain text extractor with article link annotations and Hadoop support. | ||||||||||
Xs4s | 50 | 1 | 3 years ago | 6 | July 27, 2021 | 1 | other | Scala | ||
XML Streaming for Scala including FS2/cats support | ||||||||||
Wikidump | 41 | 11 years ago | 4 | April 10, 2013 | 4 | gpl-3.0 | Python | |||
Tools to manipulate and extract data from wikipedia dumps | ||||||||||
Wikihistoryflow | 39 | 7 years ago | 1 | PHP | ||||||
Visualise Wikipedia page edits using History Flow | ||||||||||
Wikiforia | 31 | 7 years ago | 9 | gpl-2.0 | Java | |||||
A Utility Library for Wikipedia dumps |