Corpora Alternatives

Name: dariusk/corpora
Brand: dariusk/corpora
SKU: project/dariusk/corpora
Rating: 4.94 (4757 reviews)

A collection of small corpuses of interesting data for the creation of bots and similar stuff.

Categories > Data Processing > Corpus

Suggest Alternative

Stars

4,757

Alternatives

License

No license specified

Open Issues

Most Recent Commit

over 2 years ago

Programming Language

JavaScript

Monthly Downloads

Dependent Repos

Dependent Packages

Total Releases

Latest Release

May 17, 2018

Categories

Programming Languages > Javascript

Data Processing > Corpus

Repo

Alternatives To dariusk/corpora

Project Name	Stars	Repos Using This	Packages Using This	Most Recent Commit	Total Releases	Latest Release	Open Issues	License	Language
nltk/nltk	12,699	10,496	2,261	over 2 years ago	59	July 20, 2023	268	apache-2.0	Python
NLTK Source
brightmart/nlp_chinese_corpus	8,344	0	0	about 3 years ago	0		20	mit
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
nl8590687/ASRT_SpeechRecognition	7,253	0	0	over 2 years ago	1	October 23, 2020	101	gpl-3.0	Python
A Deep-Learning-Based Chinese Speech Recognition System 基于深度学习的中文语音识别系统
stanfordnlp/GloVe	6,480	0	0	almost 3 years ago	0		80	apache-2.0	C
Software in C and data files for the popular GloVe model for distributed word representations, a.k.a. word vectors or embeddings
codertimo/BERT-pytorch	5,605	1	0	about 3 years ago	5	October 23, 2018	63	apache-2.0	Python
Google AI 2018 BERT pytorch implementation
ibab/tensorflow-wavenet	5,362	0	0	about 3 years ago	0		176	mit	Python
A TensorFlow implementation of DeepMind's WaveNet paper
niderhoff/nlp-datasets	5,235	0	0	over 3 years ago	0		7
Alphabetical list of free/public domain datasets with text data for use in Natural Language Processing (NLP)
vespa-engine/vespa	5,115	5	58	over 2 years ago	741	November 30, 2023	175	apache-2.0	Java
AI + Data, online. https://vespa.ai
shibing624/pycorrector	4,928	0	1	over 2 years ago	30	November 07, 2023	27	apache-2.0	Python
pycorrector is a toolkit for text error correction. 文本纠错，实现了Kenlm，T5，MacBERT，ChatGLM3，LLaMA等模型应用在纠错场景，开箱即用。
dariusk/corpora	4,757	0	2	over 2 years ago	1	May 17, 2018	15		JavaScript
A collection of small corpuses of interesting data for the creation of bots and similar stuff.

Alternatives To dariusk/corpora

Select To Compare

nltk/nltk ⭐ 12,699

NLTK Source

dependent packages 2,261 total releases 59 most recent commit over 2 years ago downloads badge

brightmart/nlp_chinese_corpus ⭐ 8,344

大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP

dependent packages 0 total releases 0 most recent commit about 3 years ago

nl8590687/ASRT_SpeechRecognition ⭐ 7,253

A Deep-Learning-Based Chinese Speech Recognition System 基于深度学习的中文语音识别系统

dependent packages 0 total releases 1 most recent commit over 2 years ago downloads badge

stanfordnlp/GloVe ⭐ 6,480

Software in C and data files for the popular GloVe model for distributed word representations, a.k.a. word vectors or embeddings

dependent packages 0 total releases 0 most recent commit almost 3 years ago

codertimo/BERT-pytorch ⭐ 5,605

Google AI 2018 BERT pytorch implementation

dependent packages 0 total releases 5 most recent commit about 3 years ago downloads badge

ibab/tensorflow-wavenet ⭐ 5,362

A TensorFlow implementation of DeepMind's WaveNet paper

dependent packages 0 total releases 0 most recent commit about 3 years ago

niderhoff/nlp-datasets ⭐ 5,235

Alphabetical list of free/public domain datasets with text data for use in Natural Language Processing (NLP)

dependent packages 0 total releases 0 most recent commit over 3 years ago

vespa-engine/vespa ⭐ 5,115

AI + Data, online. https://vespa.ai

dependent packages 58 total releases 741 most recent commit over 2 years ago

shibing624/pycorrector ⭐ 4,928

pycorrector is a toolkit for text error correction. 文本纠错，实现了Kenlm，T5，MacBERT，ChatGLM3，LLaMA等模型应用在纠错场景，开箱即用。

dependent packages 1 total releases 30 most recent commit over 2 years ago downloads badge

dariusk/corpora ⭐ 4,757

A collection of small corpuses of interesting data for the creation of bots and similar stuff.

dependent packages 2 total releases 1 most recent commit over 2 years ago downloads badge

Suggest An Alternative To corpora

Alternative Project Comparisons

dariusk/corpora vs Nltk

dariusk/corpora vs Nlp_chinese_corpus

dariusk/corpora vs Asrt_speechrecognition

dariusk/corpora vs Glove

dariusk/corpora vs Bert Pytorch

dariusk/corpora vs Tensorflow Wavenet

dariusk/corpora vs Nlp Datasets

dariusk/corpora vs Vespa

dariusk/corpora vs Pycorrector

dariusk/corpora vs Corpora

Popular Corpus Projects

dvyukov/go-fuzz⭐ 4,674

Randomized testing for Go

facebook/duckling⭐ 3,974

Language, engine, and tooling for expressing, testing, and evaluating composable language rules on input strings.

wainshine/Chinese-Names-Corpus⭐ 3,719

中文人名语料库。人名生成器。中文姓名,姓氏,名字,称呼,日本人名,翻译人名,英文人名。可用于中文分词、人名实体识别。

buriburisuri/speech-to-text-wavenet⭐ 3,586

Speech-to-Text-WaveNet : End-to-end sentence level English speech recognition based on DeepMind's WaveNet and tensorflow

codemayq/chinese_chatbot_corpus⭐ 3,550

中文公开聊天语料库

Popular Projects Projects

codecrafters-io/build-your-own-x⭐ 500,293

Master programming by recreating your favorite technologies from scratch.

freeCodeCamp/freeCodeCamp⭐ 382,211

freeCodeCamp.org's open-source codebase and curriculum. Learn to code for free.

EbookFoundation/free-programming-books⭐ 309,521

:books: Freely available programming books

sindresorhus/awesome⭐ 286,528

😎 Awesome lists about all kinds of interesting topics

public-apis/public-apis⭐ 276,890

A collective list of free APIs

Popular Data Processing Categories

Jupyter Notebook

Dataset

Sql

Validation

Pipeline

Translation

Data Science

Classification

Transaction

Scraper