Pignlproc

Apache Pig utilities to build training corpora for machine learning / NLP out of public Wikipedia and DBpedia dumps.
Alternatives To Pignlproc
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
Wiki2vec587
6 years ago21Java
Generating Vectors for DBpedia Entities via Word2Vec and Wikipedia Dumps. Questions? https://gitter.im/idio-opensource/Lobby
Gensim Data492
6 years ago14lgpl-2.1Python
Data repository for pretrained NLP models and NLP corpora.
Fact Extractor413
8 years ago7Python
Fact Extraction from Wikipedia Text
Rel279
4 months ago1December 12, 202212mitPython
REL: Radboud Entity Linker
Wikiplots234
7 years ago5Python
A dataset containing story plots from Wikipedia (books, movies, etc.) and the code for the extractor.
Fasttextjapanesetutorial174
7 years agomitPython
Tutorial to train fastText with Japanese corpus
Wp2txt160
110 months ago29May 13, 20231mitRuby
A command-line toolkit to extract text content and category data from Wikipedia dump files
Pignlproc160
a year ago6Java
Apache Pig utilities to build training corpora for machine learning / NLP out of public Wikipedia and DBpedia dumps.
Wiki2text129
5 years agoJune 30, 20152mitNim
Extract a plain text corpus from MediaWiki XML dumps, such as Wikipedia.
Kanji Frequency116
2 months ago1cc-by-4.0Astro
Kanji usage frequency data collected from various sources
Alternatives To Pignlproc
Select To Compare


Alternative Project Comparisons
Popular Wikipedia Projects
Popular Corpus Projects
Popular Companies Categories
Related Searches

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Java
Corpus
Hadoop
Wikipedia