Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Nlp_chinese_corpus | 8,344 | a year ago | 20 | mit | ||||||
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP | ||||||||||
Chinese Names Corpus | 3,719 | 5 months ago | 7 | apache-2.0 | ||||||
中文人名语料库。人名生成器。中文姓名,姓氏,名字,称呼,日本人名,翻译人名,英文人名。可用于中文分词、人名实体识别。 | ||||||||||
Clue | 3,345 | a year ago | 73 | Python | ||||||
中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard | ||||||||||
Cluedatasetsearch | 2,778 | a year ago | 6 | Python | ||||||
搜索所有中文NLP数据集,附常用英文NLP数据集 | ||||||||||
Dialog_corpus | 1,487 | 4 years ago | 2 | Python | ||||||
用于训练中英文对话系统的语料库 Datasets for Training Chatbot System | ||||||||||
Entity Recognition Datasets | 1,386 | 7 months ago | 7 | mit | Python | |||||
A collection of corpora for named entity recognition (NER) and entity recognition tasks. These annotated datasets cover a variety of languages, domains and entity types. | ||||||||||
Company Names Corpus | 1,106 | a year ago | 3 | apache-2.0 | ||||||
公司名语料库。机构名语料库。公司简称,缩写,品牌词,企业名。可用于中文分词、机构名实体识别。 | ||||||||||
Insuranceqa Corpus Zh | 996 | 6 months ago | 11 | November 15, 2023 | 9 | other | Python | |||
:helicopter: 保险行业语料库,聊天机器人 | ||||||||||
Cdial Gpt | 944 | 2 years ago | 10 | mit | Python | |||||
A Large-scale Chinese Short-Text Conversation Dataset and Chinese pre-training dialog models | ||||||||||
Nlp Datasets | 871 | 4 years ago | 6 | |||||||
A list of datasets/corpora for NLP tasks, in reverse chronological order. |