A Sentence Cloze Dataset for Chinese Machine Reading Comprehension (CMRC 2019)
Alternatives To Cmrc2019
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
D2l Zh41,093
120 hours ago45March 25, 202224apache-2.0Python
Chinese Bert Wwm8,021
2 days ago1apache-2.0Python
Pre-Training with Whole Word Masking for Chinese BERT(中文BERT-wwm系列模型)
6 months ago46mitPython
all kinds of text classification models and more with deep learning
4 months ago19mit
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
Gpt2 Chinese6,356
12 days ago95mitPython
Chinese version of GPT2 training code, using BERT tokenizer.
Ansj_seg5,962402142 years ago10February 15, 201838apache-2.0Java
ansj分词.ict的真正java实现.分词效果速度都超过开源版的ict. 中文分词,人名识别,词性标注,用户自定义词典
4 months ago6Python
Uer Py2,458
18 days ago124apache-2.0Python
Open Source Pre-training Model Framework in PyTorch & Pre-trained Model Zoo
Gpt2 Chitchat2,437
a month ago54Python
GPT2 for Chinese chitchat/用于中文闲聊的GPT2模型(实现了DialoGPT的MMI思想)
Gse2,1511414a month ago79May 19, 20226apache-2.0Go
Go efficient multilingual NLP and text segmentation; support English, Chinese, Japanese and others.
Alternatives To Cmrc2019
Select To Compare

Alternative Project Comparisons


This repository contains the data for The Third Evaluation Workshop on Chinese Machine Reading Comprehension (CMRC 2019). We will present our paper at COLING 2020,

Title: A Sentence Cloze Dataset for Chinese Machine Reading Comprehension
Authors: Yiming Cui, Ting Liu, Ziqing Yang, Zhipeng Chen, Wentao Ma, Wanxiang Che, Shijin Wang, Guoping Hu
Venue: COLING 2020

Open Challenge Leaderboard (New!)

Keep track of the latest state-of-the-art systems on CMRC 2019 dataset.

Submission Guidelines

If you would like to test your model on the hidden test and challenge set, please follow the instructions on how to submit your model via CodaLab worksheet.

Directory Guide

  • baseline: a Chinese BERT-based simple baseline system

  • eval: contains official evaluation script

  • data: contains offical evaluation data

  • sample_submission: sample submission for codalab competition platform ( is a randomly generated prediction file, is the BERT baseline prediction file)

Baseline System

We provide a BERT-based baseline system for participants (check baseline directory for more info).

Results on other sets will be annouced later.

QAC: Question-Level Accuracy

PAC: Passage-Level Accuracy

Data Passage # Query # QAC PAC Fake Candidates Availability
Trial Data 139 1,504 71.941% 28.776% No Public
Train Data 9,638 100,009 N/A N/A No Public
Development Data 300 3,053 70.586% 13.333% Yes Public
Qualifying Data 500 5,081 70.01% 8.20% Yes Semi-Hidden
Test Data - - - - Yes Hidden

International Standard Language Resource Number (ISLRN)

ISLRN: 813-010-842-493-2


If you wish to use our data in your research, please cite our paper:

  title={A Sentence Cloze Dataset for Chinese Machine Reading Comprehension},
  author={Cui, Yiming and Liu, Ting and Yang, Ziqing and Chen, Zhipeng and Ma, Wentao and Che, Wanxiang and Wang, Shijin and Hu, Guoping},
  booktitle = 	"Proceedings of the 28th International Conference on Computational Linguistics (COLING 2020)",

Organization Committee

Host: Chinese Information Processing Society of China (CIPS)
Organizer: Joint Laboratory of HIT and iFLYTEK Research (HFL)
Sponsor: iFLYTEK Co., Ltd. and iFLYTEK Research (Hebei)

Evaluation Co-Chairs

Ting Liu, Harbin Institute of Technology
Yiming Cui, Joint Laboratory of HIT and iFLYTEK Research

Official HFL WeChat Account

Follow Joint Laboratory of HIT and iFLYTEK Research (HFL) on WeChat.


Contact us

Any problems? Feel free to concat us.
Email: cmrc2019 [aT] 126 [DoT] com
Forum: CodaLab Competition Forum
CMRC 2019 Official Website (中文):
CMRC 2019 Official Website (English):

Popular Chinese Projects
Popular Natural Language Processing Projects
Popular Community Categories
Related Searches

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Natural Language Processing
Questions And Answers