Wiki Dump Reader

Extract corpora from Wikipedia dumps
Alternatives To Wiki Dump Reader
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
Wiki2vec587
6 years ago21Java
Generating Vectors for DBpedia Entities via Word2Vec and Wikipedia Dumps. Questions? https://gitter.im/idio-opensource/Lobby
Gensim Data492
6 years ago14lgpl-2.1Python
Data repository for pretrained NLP models and NLP corpora.
Fact Extractor413
8 years ago7Python
Fact Extraction from Wikipedia Text
Rel279
5 months ago1December 12, 202212mitPython
REL: Radboud Entity Linker
Wikiplots234
7 years ago5Python
A dataset containing story plots from Wikipedia (books, movies, etc.) and the code for the extractor.
Fasttextjapanesetutorial174
8 years agomitPython
Tutorial to train fastText with Japanese corpus
Wp2txt160
1a year ago29May 13, 20231mitRuby
A command-line toolkit to extract text content and category data from Wikipedia dump files
Pignlproc160
a year ago6Java
Apache Pig utilities to build training corpora for machine learning / NLP out of public Wikipedia and DBpedia dumps.
Wiki2text129
5 years agoJune 30, 20152mitNim
Extract a plain text corpus from MediaWiki XML dumps, such as Wikipedia.
Kanji Frequency116
3 months ago1cc-by-4.0Astro
Kanji usage frequency data collected from various sources
Alternatives To Wiki Dump Reader
Select To Compare


Alternative Project Comparisons
Popular Wikipedia Projects
Popular Corpus Projects
Popular Companies Categories
Related Searches

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Python
Natural Language Processing
Wiki
Reader
Corpus
Wikipedia