Project Name	Stars	Packages Using This	Most Recent Commit	Total Releases	Latest Release	Open Issues	License	Language
Nlp_chinese_corpus	8,344		a year ago			20	mit
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
Asrt_speechrecognition	7,253		4 months ago	1	October 23, 2020	101	gpl-3.0	Python
A Deep-Learning-Based Chinese Speech Recognition System 基于深度学习的中文语音识别系统
Pycorrector	4,928	1	4 months ago	30	November 07, 2023	27	apache-2.0	Python
pycorrector is a toolkit for text error correction. 文本纠错，实现了Kenlm，T5，MacBERT，ChatGLM3，LLaMA等模型应用在纠错场景，开箱即用。
Chinese Names Corpus	3,719		4 months ago			7	apache-2.0
中文人名语料库。人名生成器。中文姓名,姓氏,名字,称呼,日本人名,翻译人名,英文人名。可用于中文分词、人名实体识别。
Clue	3,345		a year ago			73		Python
中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard
Uer Py	2,802		5 months ago			132	apache-2.0	Python
Open Source Pre-training Model Framework in PyTorch & Pre-trained Model Zoo
Cluedatasetsearch	2,778		a year ago			6		Python
搜索所有中文NLP数据集，附常用英文NLP数据集
Weibo_terminater	2,265		5 years ago			9		Python
Final Weibo Crawler Scrap Anything From Weibo, comments, weibo contents, followers, anything. The Terminator
Gpt2 Ml	1,674		a year ago			22	apache-2.0	Python
GPT2 for Multiple Languages, including pretrained models. GPT2 多语言支持, 15亿参数中文预训练模型
Rasa_nlu_chi	1,466		6 months ago			79	apache-2.0	Python
Turn Chinese natural language into structured data 中文自然语言理解

Alternatives To Cluecorpus2020

Select To Compare

Nlp_chinese_corpus ⭐ 8,344

大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP

most recent commit a year ago

Asrt_speechrecognition ⭐ 7,253

A Deep-Learning-Based Chinese Speech Recognition System 基于深度学习的中文语音识别系统

total releases 1most recent commit 4 months ago

Pycorrector ⭐ 4,928

pycorrector is a toolkit for text error correction. 文本纠错，实现了Kenlm，T5，MacBERT，ChatGLM3，LLaMA等模型应用在纠错场景，

dependent packages 1total releases 30most recent commit 4 months ago

Chinese Names Corpus ⭐ 3,719

中文人名语料库。人名生成器。中文姓名,姓氏,名字,称呼,日本人名,翻译人名,英文人名。可用于中文分词

most recent commit 4 months ago

Clue ⭐ 3,345

中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard

most recent commit a year ago

Uer Py ⭐ 2,802

Open Source Pre-training Model Framework in PyTorch & Pre-trained Model Zoo

most recent commit 5 months ago

Cluedatasetsearch ⭐ 2,778

搜索所有中文NLP数据集，附常用英文NLP数据集

most recent commit a year ago

Weibo_terminater ⭐ 2,265

Final Weibo Crawler Scrap Anything From Weibo, comments, weibo contents, followers, anything. The Terminator

most recent commit 5 years ago

Gpt2 Ml ⭐ 1,674

GPT2 for Multiple Languages, including pretrained models. GPT2 多语言支持, 15亿参数中文预训练模型

most recent commit a year ago

Rasa_nlu_chi ⭐ 1,466

Turn Chinese natural language into structured data 中文自然语言理解

most recent commit 6 months ago

Suggest An Alternative To CLUECorpus2020

Alternative Project Comparisons

Cluecorpus2020 vs Nlp_chinese_corpus

Cluecorpus2020 vs Asrt_speechrecognition

Cluecorpus2020 vs Pycorrector

Cluecorpus2020 vs Chinese Names Corpus

Cluecorpus2020 vs Clue

Cluecorpus2020 vs Uer Py

Cluecorpus2020 vs Cluedatasetsearch

Cluecorpus2020 vs Weibo_terminater

Cluecorpus2020 vs Gpt2 Ml

Cluecorpus2020 vs Rasa_nlu_chi

Popular Chinese Projects

Iptv ⭐ 74,798

Collection of publicly available IPTV channels from all over the world

most recent commit 3 months ago

Howtocook ⭐ 57,819

程序员在家做饭方法指南。Programmer's guide about how to cook at home (Chinese only).

dependent packages 1total releases 4latest release July 16, 2022most recent commit 3 months ago

D2l Zh ⭐ 56,684

《动手学深度学习》：面向中文读者、能运行、可讨论。中英文版被70多个国家的500多所大学用于教学。

dependent packages 1total releases 51latest release August 18, 2023most recent commit a month ago

Element ⭐ 53,857

A Vue.js 2.0 UI Toolkit for Web

dependent packages 4total releases 7latest release September 22, 2020most recent commit 3 months ago

Chinese Poetry ⭐ 45,313

The most comprehensive database of Chinese poetry 🧶最全中华古诗词数据库, 唐宋两朝近一万四千古诗人, 接近5.5万首唐诗加26万宋诗. 两宋时期1564位词人，21050首词。

most recent commit 5 months ago

Popular Corpus Projects

Nltk ⭐ 12,699

NLTK Source

dependent packages 2,261total releases 59latest release July 20, 2023most recent commit 4 months ago

Glove ⭐ 6,480

Software in C and data files for the popular GloVe model for distributed word representations, a.k.a. word vectors or embeddings

most recent commit 7 months ago

Bert Pytorch ⭐ 5,605

Google AI 2018 BERT pytorch implementation

total releases 5latest release October 23, 2018most recent commit 9 months ago

Tensorflow Wavenet ⭐ 5,362

A TensorFlow implementation of DeepMind's WaveNet paper

most recent commit 10 months ago

Nlp Datasets ⭐ 5,235

Alphabetical list of free/public domain datasets with text data for use in Natural Language Processing (NLP)

most recent commit a year ago

Popular Community Categories