Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Nlp_chinese_corpus | 7,386 | 4 months ago | 19 | mit | ||||||
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP | ||||||||||
Chinese Names Corpus | 3,411 | 4 months ago | 6 | apache-2.0 | ||||||
中文人名语料库。人名生成器。中文姓名,姓氏,名字,称呼,日本人名,翻译人名,英文人名。可用于中文分词、人名实体识别。 | ||||||||||
Clue | 2,954 | 4 months ago | 71 | Python | ||||||
中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard | ||||||||||
Cluedatasetsearch | 2,778 | 4 months ago | 6 | Python | ||||||
搜索所有中文NLP数据集,附常用英文NLP数据集 | ||||||||||
Textrecognitiondatagenerator | 2,607 | 16 days ago | 12 | November 15, 2020 | 114 | mit | Python | |||
A synthetic data generator for text recognition | ||||||||||
Awesome_chinese_medical_nlp | 1,411 | 2 months ago | ||||||||
中文医学NLP公开资源整理:术语集/语料库/词向量/预训练模型/知识图谱/命名实体识别/QA/信息抽取/模型/论文/etc | ||||||||||
Chinesenlp | 1,329 | 2 years ago | 3 | HTML | ||||||
Datasets, SOTA results of every fields of Chinese NLP | ||||||||||
Cluener2020 | 1,196 | 4 months ago | 48 | Python | ||||||
CLUENER2020 中文细粒度命名实体识别 Fine Grained Named Entity Recognition | ||||||||||
Cdial Gpt | 944 | 10 months ago | 10 | mit | Python | |||||
A Large-scale Chinese Short-Text Conversation Dataset and Chinese pre-training dialog models | ||||||||||
Synthtext_chinese_version | 682 | 5 years ago | 30 | C++ | ||||||
Modify from https://github.com/ankush-me/SynthText.git to generate chinese character |
ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion
Dataset is available at https://drive.google.com/drive/folders/1rOAoKcLhMhge9uVQFM2_D1EU0AjnpWFa?usp=sharing
download the data and put the json files to the data/ReCO
directory
Train | Dev | Test-a | Test-b |
---|---|---|---|
250,000 | 30,000 | 10,000 | 10,000 |
transformers
torch>=1.3.0
tqdm
joblib
apex(for mixed-precision training)
For BiDAF and other types of model, you can go to the BiDAF
folder and run. But the result is somewhat low _
For single node training:
python3 train.py --model_type=bert-base-chinese
for multiple nodes distributed training:
python3 -m torch.distributed.launch --nproc_per_node=8 train.py --model_type=bert-base-chinese
If you want to use the original doc as the context, you can set the clean(one['passage'])
in prepare_data.py line 29 to clean(one['doc'])
.
Model Name | Model Type | Model Size | Paper |
---|---|---|---|
Bert-base | bert-base-chinese |
102m | BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding |
RoBerta-large | clue/roberta_chinese_large |
325m | RoBERTa: A Robustly Optimized BERT Pretraining Approach |
ALBERT-tiny | voidful/albert_chinese_tiny |
4.1m | ALBERT: A Lite BERT for Self-supervised Learning of Language Representations |
ALBERT-base | voidful/albert_chinese_base |
10.5m | - |
ALBERT-xxlarge | voidful/albert_chinese_xxlarge |
221m | - |
python3 test.py --model_type=bert-base-chinese
Doc level
Model | Dev | Test-a |
---|---|---|
BiDAF | 55.8 | 56.4 |
Bert-Base | 61.4 | 61.1 |
RoBerta-Large | 65.7 | 65.3 |
Human | -- | 88.0 |
Evidence level
Model | Dev | Test-a |
---|---|---|
BiDAF | 68.9 | 68.4 |
Bert-Base | 76.3 | 77.1 |
RoBerta-Large | 78.7 | 79.2 |
ALBert-tiny | 70.9 | 70.4 |
ALBert-base | 76.9 | 77.3 |
ALBert-xxLarge | 80.8 | 81.2 |
Human | -- | 91.5 |
If you use ReCO in your research, please cite our work with the following BibTex Entry
@inproceedings{DBLP:conf/aaai/WangYZXW20,
author = {Bingning Wang and
Ting Yao and
Qi Zhang and
Jingfang Xu and
Xiaochuan Wang},
title = {ReCO: {A} Large Scale Chinese Reading Comprehension Dataset on Opinion},
booktitle = {The Thirty-Fourth {AAAI} Conference on Artificial Intelligence, {AAAI}
2020, The Thirty-Second Innovative Applications of Artificial Intelligence
Conference, {IAAI} 2020, The Tenth {AAAI} Symposium on Educational
Advances in Artificial Intelligence, {EAAI} 2020, New York, NY, USA,
February 7-12, 2020},
pages = {9146--9153},
publisher = {{AAAI} Press},
year = {2020},
url = {https://aaai.org/ojs/index.php/AAAI/article/view/6450},
timestamp = {Thu, 04 Jun 2020 13:18:48 +0200},
biburl = {https://dblp.org/rec/conf/aaai/WangYZXW20.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}