Reco

ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion
Alternatives To Reco
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
Nlp_chinese_corpus7,386
4 months ago19mit
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
Chinese Names Corpus3,411
4 months ago6apache-2.0
中文人名语料库。人名生成器。中文姓名,姓氏,名字,称呼,日本人名,翻译人名,英文人名。可用于中文分词、人名实体识别。
Clue2,954
4 months ago71Python
中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard
Cluedatasetsearch2,778
4 months ago6Python
搜索所有中文NLP数据集,附常用英文NLP数据集
Textrecognitiondatagenerator2,607
16 days ago12November 15, 2020114mitPython
A synthetic data generator for text recognition
Awesome_chinese_medical_nlp1,411
2 months ago
中文医学NLP公开资源整理:术语集/语料库/词向量/预训练模型/知识图谱/命名实体识别/QA/信息抽取/模型/论文/etc
Chinesenlp1,329
2 years ago3HTML
Datasets, SOTA results of every fields of Chinese NLP
Cluener20201,196
4 months ago48Python
CLUENER2020 中文细粒度命名实体识别 Fine Grained Named Entity Recognition
Cdial Gpt944
10 months ago10mitPython
A Large-scale Chinese Short-Text Conversation Dataset and Chinese pre-training dialog models
Synthtext_chinese_version682
5 years ago30C++
Modify from https://github.com/ankush-me/SynthText.git to generate chinese character
Alternatives To Reco
Select To Compare


Alternative Project Comparisons
Readme

ReCO

ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion

Data

Dataset is available at https://drive.google.com/drive/folders/1rOAoKcLhMhge9uVQFM2_D1EU0AjnpWFa?usp=sharing

download the data and put the json files to the data/ReCO directory

Stats

Train Dev Test-a Test-b
250,000 30,000 10,000 10,000

Requirenments

transformers
torch>=1.3.0
tqdm
joblib
apex(for mixed-precision training)

Train and Test

For BiDAF and other types of model, you can go to the BiDAF folder and run. But the result is somewhat low _

Pre-training methods finetuning:

For single node training:
python3 train.py --model_type=bert-base-chinese
for multiple nodes distributed training:
python3 -m torch.distributed.launch --nproc_per_node=8 train.py --model_type=bert-base-chinese

If you want to use the original doc as the context, you can set the clean(one['passage']) in prepare_data.py line 29 to clean(one['doc']).

model card

Model Name Model Type Model Size Paper
Bert-base bert-base-chinese 102m BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
RoBerta-large clue/roberta_chinese_large 325m RoBERTa: A Robustly Optimized BERT Pretraining Approach
ALBERT-tiny voidful/albert_chinese_tiny 4.1m ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
ALBERT-base voidful/albert_chinese_base 10.5m -
ALBERT-xxlarge voidful/albert_chinese_xxlarge 221m -

Test

python3 test.py --model_type=bert-base-chinese

Results

Doc level

Model Dev Test-a
BiDAF 55.8 56.4
Bert-Base 61.4 61.1
RoBerta-Large 65.7 65.3
Human -- 88.0

Evidence level

Model Dev Test-a
BiDAF 68.9 68.4
Bert-Base 76.3 77.1
RoBerta-Large 78.7 79.2
ALBert-tiny 70.9 70.4
ALBert-base 76.9 77.3
ALBert-xxLarge 80.8 81.2
Human -- 91.5

Citation

If you use ReCO in your research, please cite our work with the following BibTex Entry

@inproceedings{DBLP:conf/aaai/WangYZXW20,
  author    = {Bingning Wang and
               Ting Yao and
               Qi Zhang and
               Jingfang Xu and
               Xiaochuan Wang},
  title     = {ReCO: {A} Large Scale Chinese Reading Comprehension Dataset on Opinion},
  booktitle = {The Thirty-Fourth {AAAI} Conference on Artificial Intelligence, {AAAI}
               2020, The Thirty-Second Innovative Applications of Artificial Intelligence
               Conference, {IAAI} 2020, The Tenth {AAAI} Symposium on Educational
               Advances in Artificial Intelligence, {EAAI} 2020, New York, NY, USA,
               February 7-12, 2020},
  pages     = {9146--9153},
  publisher = {{AAAI} Press},
  year      = {2020},
  url       = {https://aaai.org/ojs/index.php/AAAI/article/view/6450},
  timestamp = {Thu, 04 Jun 2020 13:18:48 +0200},
  biburl    = {https://dblp.org/rec/conf/aaai/WangYZXW20.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}
Popular Dataset Projects
Popular Chinese Projects
Popular Data Processing Categories
Related Searches

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Python
Deep Learning
Dataset
Chinese
Large Scale