Project Name	Stars	Repos Using This	Most Recent Commit	Total Releases	Latest Release	Open Issues	License	Language
Wiki2vec	587		6 years ago			21		Java
Generating Vectors for DBpedia Entities via Word2Vec and Wikipedia Dumps. Questions? https://gitter.im/idio-opensource/Lobby
Gensim Data	492		6 years ago			14	lgpl-2.1	Python
Data repository for pretrained NLP models and NLP corpora.
Fact Extractor	413		8 years ago			7		Python
Fact Extraction from Wikipedia Text
Rel	279		5 months ago	1	December 12, 2022	12	mit	Python
REL: Radboud Entity Linker
Wikiplots	234		7 years ago			5		Python
A dataset containing story plots from Wikipedia (books, movies, etc.) and the code for the extractor.
Fasttextjapanesetutorial	174		8 years ago				mit	Python
Tutorial to train fastText with Japanese corpus
Wp2txt	160	1	a year ago	29	May 13, 2023	1	mit	Ruby
A command-line toolkit to extract text content and category data from Wikipedia dump files
Pignlproc	160		a year ago			6		Java
Apache Pig utilities to build training corpora for machine learning / NLP out of public Wikipedia and DBpedia dumps.
Wiki2text	129		5 years ago		June 30, 2015	2	mit	Nim
Extract a plain text corpus from MediaWiki XML dumps, such as Wikipedia.
Kanji Frequency	116		3 months ago			1	cc-by-4.0	Astro
Kanji usage frequency data collected from various sources

Alternatives To Un General Debates

Select To Compare

Wiki2vec ⭐ 587

Generating Vectors for DBpedia Entities via Word2Vec and Wikipedia Dumps. Questions? https://gitter.im/idio-opensource/Lobby

most recent commit 6 years ago

Gensim Data ⭐ 492

Data repository for pretrained NLP models and NLP corpora.

most recent commit 6 years ago

Fact Extractor ⭐ 413

Fact Extraction from Wikipedia Text

most recent commit 8 years ago

Rel ⭐ 279

REL: Radboud Entity Linker

total releases 1most recent commit 5 months ago

Wikiplots ⭐ 234

A dataset containing story plots from Wikipedia (books, movies, etc.) and the code for the extractor.

most recent commit 7 years ago

Fasttextjapanesetutorial ⭐ 174

Tutorial to train fastText with Japanese corpus

most recent commit 8 years ago

Wp2txt ⭐ 160

A command-line toolkit to extract text content and category data from Wikipedia dump files

total releases 29most recent commit a year ago

Pignlproc ⭐ 160

Apache Pig utilities to build training corpora for machine learning / NLP out of public Wikipedia and DBpedia dumps.

most recent commit a year ago

Wiki2text ⭐ 129

Extract a plain text corpus from MediaWiki XML dumps, such as Wikipedia.

most recent commit 5 years ago

Kanji Frequency ⭐ 116

Kanji usage frequency data collected from various sources

most recent commit 3 months ago

Suggest An Alternative To un-general-debates

Alternative Project Comparisons

Un General Debates vs Wiki2vec

Un General Debates vs Gensim Data

Un General Debates vs Fact Extractor

Un General Debates vs Rel

Un General Debates vs Wikiplots

Un General Debates vs Fasttextjapanesetutorial

Un General Debates vs Wp2txt

Un General Debates vs Pignlproc

Un General Debates vs Wiki2text

Un General Debates vs Kanji Frequency

Popular Corpus Projects

Nltk ⭐ 12,699

NLTK Source

dependent packages 2,261total releases 59latest release July 20, 2023most recent commit 3 months ago

Nlp_chinese_corpus ⭐ 8,344

大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP

most recent commit a year ago

Asrt_speechrecognition ⭐ 7,253

A Deep-Learning-Based Chinese Speech Recognition System 基于深度学习的中文语音识别系统

total releases 1latest release October 23, 2020most recent commit 3 months ago

Glove ⭐ 6,480

Software in C and data files for the popular GloVe model for distributed word representations, a.k.a. word vectors or embeddings

most recent commit 7 months ago

Bert Pytorch ⭐ 5,605

Google AI 2018 BERT pytorch implementation

total releases 5latest release October 23, 2018most recent commit 9 months ago

Popular Wikipedia Projects

Design Patterns For Humans ⭐ 42,678

An ultra-simplified explanation to design patterns

most recent commit 4 months ago

Hacker Laws ⭐ 24,993

💻📖 Laws, Theories, Principles and Patterns that developers will find useful. #hackerlaws

most recent commit 9 months ago

Javascript Design Patterns For Humans ⭐ 4,191

An ultra-simplified explanation of design patterns implemented in javascript

most recent commit 4 months ago

Mediawiki ⭐ 3,827

🌻 The collaborative editing software that runs Wikipedia. Mirror from https://gerrit.wikimedia.org/g/mediawi See https://mediawiki.org/wiki/Developer_access for contributing.

dependent packages 4total releases 167latest release September 29, 2023most recent commit 3 months ago

Wikiextractor ⭐ 3,440

A tool for extracting plain text from Wikipedia dumps

dependent packages 3total releases 4latest release October 14, 2021most recent commit 7 months ago

Popular Data Processing Categories

Get A Weekly Email With Trending Projects For These Categories

No Spam. Unsubscribe easily at any time.

Jupyter Notebook

Corpus

Wikipedia

Probability

Lda

Privacy | About | Terms | Follow Us On Twitter

Downloads, Dependent Repos, Dependent Packages, Total Releases, Latest Releases data powered by Libraries.io.