Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Wiki2vec | 587 | 6 years ago | 21 | Java | ||||||
Generating Vectors for DBpedia Entities via Word2Vec and Wikipedia Dumps. Questions? https://gitter.im/idio-opensource/Lobby | ||||||||||
Gensim Data | 492 | 6 years ago | 14 | lgpl-2.1 | Python | |||||
Data repository for pretrained NLP models and NLP corpora. | ||||||||||
Fact Extractor | 413 | 8 years ago | 7 | Python | ||||||
Fact Extraction from Wikipedia Text | ||||||||||
Rel | 279 | 5 months ago | 1 | December 12, 2022 | 12 | mit | Python | |||
REL: Radboud Entity Linker | ||||||||||
Wikiplots | 234 | 7 years ago | 5 | Python | ||||||
A dataset containing story plots from Wikipedia (books, movies, etc.) and the code for the extractor. | ||||||||||
Fasttextjapanesetutorial | 174 | 8 years ago | mit | Python | ||||||
Tutorial to train fastText with Japanese corpus | ||||||||||
Wp2txt | 160 | 1 | a year ago | 29 | May 13, 2023 | 1 | mit | Ruby | ||
A command-line toolkit to extract text content and category data from Wikipedia dump files | ||||||||||
Pignlproc | 160 | a year ago | 6 | Java | ||||||
Apache Pig utilities to build training corpora for machine learning / NLP out of public Wikipedia and DBpedia dumps. | ||||||||||
Wiki2text | 129 | 5 years ago | June 30, 2015 | 2 | mit | Nim | ||||
Extract a plain text corpus from MediaWiki XML dumps, such as Wikipedia. | ||||||||||
Kanji Frequency | 116 | 3 months ago | 1 | cc-by-4.0 | Astro | |||||
Kanji usage frequency data collected from various sources |