Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Nlp_chinese_corpus | 7,386 | 4 months ago | 19 | mit | ||||||
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP | ||||||||||
Chinese Names Corpus | 3,411 | 4 months ago | 6 | apache-2.0 | ||||||
中文人名语料库。人名生成器。中文姓名,姓氏,名字,称呼,日本人名,翻译人名,英文人名。可用于中文分词、人名实体识别。 | ||||||||||
Clue | 2,954 | 4 months ago | 71 | Python | ||||||
中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard | ||||||||||
Cluedatasetsearch | 2,778 | 4 months ago | 6 | Python | ||||||
搜索所有中文NLP数据集,附常用英文NLP数据集 | ||||||||||
Textrecognitiondatagenerator | 2,607 | 16 days ago | 12 | November 15, 2020 | 114 | mit | Python | |||
A synthetic data generator for text recognition | ||||||||||
Awesome_chinese_medical_nlp | 1,411 | 2 months ago | ||||||||
中文医学NLP公开资源整理:术语集/语料库/词向量/预训练模型/知识图谱/命名实体识别/QA/信息抽取/模型/论文/etc | ||||||||||
Chinesenlp | 1,329 | 2 years ago | 3 | HTML | ||||||
Datasets, SOTA results of every fields of Chinese NLP | ||||||||||
Cluener2020 | 1,196 | 4 months ago | 48 | Python | ||||||
CLUENER2020 中文细粒度命名实体识别 Fine Grained Named Entity Recognition | ||||||||||
Cdial Gpt | 944 | 10 months ago | 10 | mit | Python | |||||
A Large-scale Chinese Short-Text Conversation Dataset and Chinese pre-training dialog models | ||||||||||
Synthtext_chinese_version | 682 | 5 years ago | 30 | C++ | ||||||
Modify from https://github.com/ankush-me/SynthText.git to generate chinese character |
This repository contains the data for The Second Evaluation Workshop on Chinese Machine Reading Comprehension (CMRC 2018). We will present our paper on EMNLP 2019.
Title: A Span-Extraction Dataset for Chinese Machine Reading Comprehension
Authors: Yiming Cui, Ting Liu, Wanxiang Che, Li Xiao, Zhipeng Chen, Wentao Ma, Shijin Wang, Guoping Hu
Link: https://www.aclweb.org/anthology/D19-1600/
Venue: EMNLP-IJCNLP 2019
Keep track of the latest state-of-the-art systems on CMRC 2018 dataset.
https://ymcui.github.io/cmrc2018/
Please download CMRC 2018 public datasets via the following CodaLab Worksheet.
https://worksheets.codalab.org/worksheets/0x92a80d2fab4b4f79a2b4064f7ddca9ce
If you would like to test your model on the hidden test and challenge set, please follow the instructions on how to submit your model via CodaLab worksheet.
https://worksheets.codalab.org/worksheets/0x96f61ee5e9914aee8b54bd11e66ec647/
**Note that the test set on CLUE is NOT the complete test set. If you wish to evaluate your model OFFICIALLY on CMRC 2018, you should follow the guidelines here. **
You can also access this dataset as part of the HuggingFace datasets
library library as follow:
!pip install datasets
from datasets import load_dataset
dataset = load_dataset('cmrc2018')
More details on the options and usage for this library can be found on the nlp
repository at huggingface/nlp
If you wish to use our data in your research, please cite:
@inproceedings{cui-emnlp2019-cmrc2018,
title = "A Span-Extraction Dataset for {C}hinese Machine Reading Comprehension",
author = "Cui, Yiming and
Liu, Ting and
Che, Wanxiang and
Xiao, Li and
Chen, Zhipeng and
Ma, Wentao and
Wang, Shijin and
Hu, Guoping",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)",
month = nov,
year = "2019",
address = "Hong Kong, China",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/D19-1600",
doi = "10.18653/v1/D19-1600",
pages = "5886--5891",
}
ISLRN: 013-662-947-043-2
http://www.islrn.org/resources/resources_info/7952/
Follow Joint Laboratory of HIT and iFLYTEK Research (HFL) on WeChat.
Please submit an issue.