Cblue

中文医疗信息处理基准CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark
Alternatives To Cblue
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
Nlp_chinese_corpus8,344
5 days ago20mit
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
Clue3,345
5 days ago73Python
中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard
Cblue515
a month ago1apache-2.0Python
中文医疗信息处理基准CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark
Fewclue347
9 months ago7Python
FewCLUE 小样本学习测评基准,中文版
Reedsolomon26339255 months ago4December 17, 2019mitGo
Reed-Solomon Erasure Code engine in Go, could more than 15GB/s per core
Chineseblue212
2 years ago3apache-2.0Python
Chinese Biomedical Language Understanding Evaluation benchmark (ChineseBLUE)
Cctsdb209
5 months ago15
CSUST Chinese Traffic Sign Detection Benchmark
Segment9892 months ago18April 23, 20201apache-2.0Java
The jieba-analysis tool for java.(基于结巴分词词库实现的更加灵活优雅易用,高性能的 java 分词实现。支持词性标注。)
Lrebench22
4 months agomitPython
Code for the EMNLP2022 paper "Towards Realistic Low-resource Relation Extraction: A Benchmark with Empirical Baseline Study."
Z Bench16
3 months agocc-by-4.0
Z-Bench 1.0 by 真格基金:一个麻瓜的大语言模型中文测试集。Z-Bench is a LLM prompt dataset for non-technical users, developed by an enthusiastic AI-focused team in Zhenfund.
Alternatives To Cblue
Select To Compare


Alternative Project Comparisons
Readme

English | ****

CBLUE

License GitHub stars

AI (Artificial Intelligence) plays an indispensable role in the biomedical field, helping improve medical technology. For further accelerating AI research in the biomedical field, we present Chinese Biomedical Language Understanding Evaluation (CBLUE), including datasets collected from real-world biomedical scenarios, baseline models, and an online platform for model evaluation, comparison, and analysis.

CBLUE Benchmark

We evaluate the current 11 Chinese pre-trained models on the eight biomedical language understanding tasks and report the baselines of these tasks.

Model CMedEE CMedIE CDN CTC STS QIC QTR QQR Avg.
BERT-base 62.1 54.0 55.4 69.2 83.0 84.3 60.0 84.7 69.0
BERT-wwm-ext-base 61.7 54.0 55.4 70.1 83.9 84.5 60.9 84.4 69.4
ALBERT-tiny 50.5 35.9 50.2 61.0 79.7 75.8 55.5 79.8 61.1
ALBERT-xxlarge 61.8 47.6 37.5 66.9 84.8 84.8 62.2 83.1 66.1
RoBERTa-large 62.1 54.4 56.5 70.9 84.7 84.2 60.9 82.9 69.6
RoBERTa-wwm-ext-base 62.4 53.7 56.4 69.4 83.7 85.5 60.3 82.7 69.3
RoBERTa-wwm-ext-large 61.8 55.9 55.7 69.0 85.2 85.3 62.8 84.4 70.0
PCL-MedBERT 60.6 49.1 55.8 67.8 83.8 84.3 59.3 82.5 67.9
ZEN 61.0 50.1 57.8 68.6 83.5 83.2 60.3 83.0 68.4
MacBERT-base 60.7 53.2 57.7 67.7 84.4 84.9 59.7 84.0 69.0
MacBERT-large 62.4 51.6 59.3 68.6 85.6 82.7 62.9 83.5 69.6
Human 67.0 66.0 65.0 78.0 93.0 88.0 71.0 89.0 77.1

Baseline of tasks

We present the baseline models on the biomedical tasks and release corresponding codes for a quick start.

Requirements

python3 / pytorch 1.7 / transformers 4.5.1 / jieba / gensim / sklearn

Data preparation

Download dataset

The whole zip package includes the datasets of 8 biomedical NLU tasks (more detail in the following section). Every task includes the following files:

 {Task}
|   {Task}_train.json
|   {Task}_test.json
|   {Task}_dev.json
|   example_gold.json
|   example_pred.json
|   README.md

Notice: a few tasks have additional files, e.g. it includes 'category.xlsx' file in the CHIP-CTC task.

You can download Chinese pre-trained models according to your need (download URLs are provided above). With Huggingface-Transformers , the models above could be easily accessed and loaded.

The reference directory:

 CBLUE         
|   baselines
|      run_classifier.py
|      ...
|   examples
|      run_qqr.sh
|      ...
|   cblue
|   CBLUEDatasets
|      KUAKE-QQR
|      ...
|   data
|      output
|      model_data
|         bert-base
|         ...
|      result_output
|         KUAKE-QQR_test.json
|         ...

Running examples

The shell files of training and evaluation for every task are provided in examples/ , and could directly run.

Also, you can utilize the running codes in baselines/ , and write your shell files according to your need:

  • baselines/run_classifer.py: support {sts, qqr, qtr, qic, ctc, ee} tasks;
  • baselines/run_cdn.py: support {cdn} task;
  • baselines/run_ie.py: support {ie} task.

Training models

Running shell files: bash examples/run_{task}.sh, and the contents of shell files are as follow:

DATA_DIR="CBLUEDatasets"

TASK_NAME="qqr"
MODEL_TYPE="bert"
MODEL_DIR="data/model_data"
MODEL_NAME="chinese-bert-wwm"
OUTPUT_DIR="data/output"
RESULT_OUTPUT_DIR="data/result_output"

MAX_LENGTH=128

python baselines/run_classifier.py \
    --data_dir=${DATA_DIR} \
    --model_type=${MODEL_TYPE} \
    --model_dir=${MODEL_DIR} \
    --model_name=${MODEL_NAME} \
    --task_name=${TASK_NAME} \
    --output_dir=${OUTPUT_DIR} \
    --result_output_dir=${RESULT_OUTPUT_DIR} \
    --do_train \
    --max_length=${MAX_LENGTH} \
    --train_batch_size=16 \
    --eval_batch_size=16 \
    --learning_rate=3e-5 \
    --epochs=3 \
    --warmup_proportion=0.1 \
    --earlystop_patience=3 \
    --logging_steps=250 \
    --save_steps=250 \
    --seed=2021

Notice: the best checkpoint is saved in OUTPUT_DIR/MODEL_NAME/.

  • MODEL_TYPE: support {bert, roberta, albert, zen} model types;
  • MODEL_NAME: support {bert-base, bert-wwm-ext, albert-tiny, albert-xxlarge, zen, pcl-medbert, roberta-large, roberta-wwm-ext-base, roberta-wwm-ext-large, macbert-base, macbert-large} Chinese pre-trained models.

The MODEL_TYPE-MODEL_NAME mappings are listed below.

MODEL_TYPE MODEL_NAME
bert bert-base, bert-wwm-ext, pcl-medbert, macbert-base, macbert-large
roberta roberta-large, roberta-wwm-ext-base, roberta-wwm-ext-large
albert albert-tiny, albert-xxlarge
zen zen

Inference & generation of results

Running shell files: base examples/run_{task}.sh predict, and the contents of shell files are as follows:

DATA_DIR="CBLUEDatasets"

TASK_NAME="qqr"
MODEL_TYPE="bert"
MODEL_DIR="data/model_data"
MODEL_NAME="chinese-bert-wwm"
OUTPUT_DIR="data/output"
RESULT_OUTPUT_DIR="data/result_output"

MAX_LENGTH=128

python baselines/run_classifier.py \
    --data_dir=${DATA_DIR} \
    --model_type=${MODEL_TYPE} \
    --model_name=${MODEL_NAME} \
    --model_dir=${MODEL_DIR} \
    --task_name=${TASK_NAME} \
    --output_dir=${OUTPUT_DIR} \
    --result_output_dir=${RESULT_OUTPUT_DIR} \
    --do_predict \
    --max_length=${MAX_LENGTH} \
    --eval_batch_size=16 \
    --seed=2021

Notice: the result of prediction {TASK_NAME}_test.json will be generated in RESULT_OUTPUT_DIR .

Check format

Before you submit the predicted test files, you could check the format of test files using format_checker and avoid the invalid evalution score induced by the format errors.

  • Step1: Copy the original test file(without answer) {taskname}_test.[json|jsonl|tsv] to this directory format_checker, and rename as {taskname}_test_raw.[json|jsonl|tsv].
# take the CMeEE task for example:
cp ${path_to_CMeEE}/CMeEE_test.json ${current_dir}/CMeEE_test_raw.json 
  • Step2: Execute the following format_checker script using the raw test file (from Step1) and your prediction file:
python3 format_checker_${taskname}.py {taskname}_test_raw.[json|jsonl|tsv] {taskname}_test.[json|jsonl|tsv] 

# take the CMeEE task for example:
python3 format_checker_CMeEE.py CMeEE_test_raw.json CMeEE_test.json

What is special?

IMCS-NER & IMCS-V2-NER tasks:
  • Step1: Copy both the original test file(without answer) IMCS-NER_test.json(IMCS-V2-NER_test.json) and the IMCS_test.json(IMCS-V2_test.json) to this directory, and rename as IMCS-NER_test_raw.json(IMCS-V2-NER_test_raw.json)
# for IMCS-NER task:
cp ${path_to_IMCS-NER}/IMCS-NER_test.json ${current_dir}/IMCS-NER_test_raw.json 
cp ${path_to_IMCS-NER}/IMCS_test.json ${current_dir}
# for IMCS-V2-NER task:
cp ${path_to_IMCS-V2-NER}/IMCS-V2-NER_test.json ${current_dir}/IMCS-V2-NER_test_raw.json 
cp ${path_to_IMCS-V2-NER}/IMCS-V2_test.json ${current_dir}
  • Step2: Execute the following format_checker script using the raw test file (from Step1) and your prediction file:
# for IMCS-NER task:
python3 format_checker_IMCS_V1_NER.py  IMCS-NER_test_raw.json IMCS-NER_test.json IMCS_test.json
# for IMCS-V2-NER task:
python3 format_checker_IMCS_V2_NER.py  IMCS-V2-NER_test_raw.json IMCS-V2-NER_test.json IMCS-V2_test.json
IMCS-SR & IMCS-V2-SR, MedDG tasks

If you want to implement the optional check login in the check_format function, which is commented in the master branch. You need also copy the normalized dictionary files to the current dir.

  • MedDG: the dictionary file is entity_list.txt
  • IMCS-SR: the dictionary file is symptom_norm.csv
  • IMCS-V2-SR: the dictionary file is mappings.json

Submit results

Compressing RESULT_OUTPUT_DIR as .zip file and submitting the file, you will get the score of evaluation on these biomedical NLU tasks, and your ranking!

Submit your results!

submit

Introduction of tasks

For promoting the development and the application of language model in the biomedical field, we collect data from real-world biomedical scenarios and release the eight biomedical NLU (natural language understanding) tasks, including information extraction from the medical text (named entity recognition, relation extraction), normalization of the medical term, medical text classification, medical sentence similarity estimation and medical QA.

Dataset Task Train Dev Test Evaluation Metrics
CMeEE NER 15,000 5,000 3,000 Micro F1
CMeIE Relation Extraction 14,339 3,585 4,482 Micro F1
CHIP-CDN Diagnosis Normalization 6,000 2,000 10,192 Micro F1
CHIP-STS Sentence Similarity 16,000 4,000 10,000 Macro F1
CHIP-CTC Sentence Classification 22,962 7,682 10,000 Macro F1
KUAKE-QIC Sentence Classification 6,931 1,955 1,944 Accuracy
KUAKE-QTR NLI 24,174 2,913 5,465 Accuracy
KUAKE-QQR NLI 15,000 1,600 1,596 Accuracy

CMeEE

The evaluation task is the recognition of the named entity on the medical text. Given schema data and medical sentences, models are expected to extract entity about clinical information and classify these entities exactly.

example { "text": "", "entities": [ { "start_idx": 0, "end_idx": 2, "type": "bod", "entity: "" }, { "start_idx": 0, "end_idx": 4, "type": "sym", "entity: "" }, { "start_idx": 6, "end_idx": 9, "type": "bod", "entity: "" }, { "start_idx": 6, "end_idx": 11, "type": "sym", "entity: "" }, { "start_idx": 15, "end_idx": 18, "type": "sym", "entity: "" }, { "start_idx": 22, "end_idx": 23, "type": "dis", "entity: "" }, { "start_idx": 25, "end_idx": 27, "type": "dis", "entity: "" } ] }

CMeIE

The evaluation task is the extraction of entity relation on the medical text. Given schema and medical sentences, models are expected to automatically extract triples=[(S1, P1, O1), (S2, P2, O2)] satisfying the constraint of schema. The schema defines the category of the predicate and corresponding subject and object, e.g.

subject_type:predicate: object_type: subject_type:predicate: object_type:

example { "text": "@ ### 1964 (5-50Gy) @", "spo_list": [ { "Combined": true, "predicate": "", "subject": "", "subject_type": "", "object": { "@value": "" }, "object_type": { "@value": "" } }, { "Combined": true, "predicate": "", "subject": "", "subject_type": "", "object": { "@value": "" }, "object_type": { "@value": "" } } } ] }

CHIP-CDN

The evaluation task is the normalization of the diagnosis entity from the Chinese medical record. Given a diagnosis entity, models are expected to return corresponding standard terms.

example [ { "text": "", "normalized_result": "##" }, { "text": ";;", "normalized_result": "########" }, { "text": "IV", "normalized_result": "##" } ]

CHIP-CTC

In this evaluation task, given 44 semantic categories of screening standard (more detail in category.xlsx) and some description about Chinese clinical screening standard, models are expected to return every description's specific category.

example [ { "id": "s1", "label": "Multiple", "text": " 7.INR1.5 PTULN+4 APTT >1.5 ULN" }, { "id": "s2", "label": "Addictive Behavior", "text": " 210/" }, { "id": "s3", "label": "Therapy or Surgery", "text": " 13. " } ]

CHIP-STS

In this evaluation task, given pairs of sentences involving five different diseases, models are expected to judge the semantic similarity of the pair of sentences.

example [ { "id": "1", "text1": "", "text2": "", "label": "1", "category": "diabetes" }, { "id": "2", "text1": "", "text2": "", "label": "1", "category": "diabetes" }, { "id": "3", "text1": "", "text2": "", "label": "0", "category": "hepatitis" } ]

KUAKE-QIC

In this evaluation task, given a medical query, models are expected to classify the intention of patients. These medical queries have 11 categories: diagnosis, cause, method, advice, metric explain, disease expression, result, attention, effect, price, other.

example [ { "id": "s1", "query": "", "label": "" }, { "id": "s2", "query": "19255", "label": "" }, { "id": "s3", "query": "", "label": "" } ]

KUAKE-QTR

In this evaluation task, given a pair of query and title, models are expected to predict whether the topic of the pair query and title is consistent and the extent of their consistency.

example [ { "id": "s1", "query": "", "title": "", "label": "2" }, { "id": "s2", "query": "", "title": " ...", "label": "1" }, { "id": "s3", "query": "", "title": "...", "label": "1" } ]

KUAKE-QQR

In this evaluation task, given a pair of queries, models are expected to predict the extent of similarity between them.

example [ { "id": "s1", "query": "", "title": "", "label": "2" }, { "id": "s2", "query": "", "title": "", "label": "0" }, { "id": "s3", "query": "", "title": "", "label": "2" } ]

Quick start

The modules of Data Processor, Model trainer could be found in cblue/. You can easily construct your code, train and evaluate your own models and methods. The corresponding Data Processor, Dataset, Trainer of eight tasks are listed below:

Task Data Processor (cblue.data) Dataset (cblue.data) Trainer (cblue.trainer)
CMeEE EEDataProcessor EEDataset EETrainer
CMeIE ERDataProcessor/REDataProcessor ERDataset/REDataset ERTrainer/RETrainer
CHIP-CDN CDNDataProcessor CDNDataset CDNForCLSTrainer/CDNForNUMTrainer
CHIP-CTC CTCDataProcessor CTCDataset CTCTrainer
CHIP-STS STSDataProcessor STSDataset STSTrainer
KUAKE-QIC QICDataProcessor QICDataset QICTrainer
KUAKE-QQR QQRDataProcessor QQRDataset QQRTrainer
KUAKE-QTR QTRDataProcessor QTRDataset QTRTrainer

Example for CMeEE

from cblue.data import EEDataProcessor, EEDataset
from cblue.trainer import EETrainer
from cblue.metrics import ee_metric, ee_commit_prediction

# get samples
data_processor = EEDataProcessor(root=...)
train_samples = data_processor.get_train_sample()
eval_samples = data_processor.get_dev_sample()
test_samples = data_processor,get_test_sample()

# 'torch.Dataset'
train_dataset = EEDataset(train_sample, tokenizer=..., mode='train', max_length=...)

# training model
trainer = EETrainer(...)
trainer.train(...)

# predicton and generation of result
test_dataset = EEDataset(test_sample, tokenizer=..., mode='test', max_length=...)
trainer.predict(test_dataset)

Training setup

We list the hyper-parameters of every tasks during the baseline experiments.

Common hyper-parameters

Param Value
warmup_proportion 0.1
weight_decay 0.01
adam_epsilon 1e-8
max_grad_norm 1.0

CMeEE

Hyper-parameters for the training of pre-trained models with a token classification head on top for named entity recognition of the CMeEE task.

Model epoch batch_size max_length learning_rate
bert-base 5 32 128 4e-5
bert-wwm-ext 5 32 128 4e-5
roberta-wwm-ext 5 32 128 4e-5
roberta-wwm-ext-large 5 12 65 2e-5
roberta-large 5 12 65 2e-5
albert-tiny 10 32 128 5e-5
albert-xxlarge 5 12 65 1e-5
zen 5 20 128 4e-5
macbert-base 5 32 128 4e-5
macbert-large 5 12 80 2e-5
PCL-MedBERT 5 32 128 4e-5

CMeIE-ER

Hyper-parameters for the training of pre-trained models with a token-level classifier for subject and object recognition of the CMeIE task.

Model epoch batch_size max_length learning_rate
bert-base 7 32 128 5e-5
bert-wwm-ext 7 32 128 5e-5
roberta-wwm-ext 7 32 128 4e-5
roberta-wwm-ext-large 7 16 80 4e-5
roberta-large 7 16 80 2e-5
albert-tiny 10 32 128 4e-5
albert-xxlarge 7 16 80 1e-5
zen 7 20 128 4e-5
macbert-base 7 32 128 4e-5
macbert-large 7 20 80 2e-5
PCL-MedBERT 7 32 128 4e-5

CMeIE-RE

Hyper-parameters for the training of pre-trained models with a classifier for the entity pairs relation prediction of the CMeIE task.

Model epoch batch_size max_length learning_rate
bert-base 8 32 128 5e-5
bert-wwm-ext 8 32 128 5e-5
roberta-wwm-ext 8 32 128 4e-5
roberta-wwm-ext-large 8 16 80 4e-5
roberta-large 8 16 80 2e-5
albert-tiny 10 32 128 4e-5
albert-xxlarge 8 16 80 1e-5
zen 8 20 128 4e-5
macbert-base 8 32 128 4e-5
macbert-large 8 20 80 2e-5
PCL-MedBERT 8 32 128 4e-5

CHIP-CTC

Hyper-parameters for the training of pre-trained models with a sequence classification head on top for screening criteria classification of the CHIP-CTC task.

Model epoch batch_size max_length learning_rate
bert-base 5 32 128 5e-5
bert-wwm-ext 5 32 128 5e-5
roberta-wwm-ext 5 32 128 4e-5
roberta-wwm-ext-large 5 32 50 3e-5
roberta-large 5 24 50 2e-5
albert-tiny 10 32 128 4e-5
albert-xxlarge 5 20 50 1e-5
zen 5 20 128 4e-5
macbert-base 5 32 128 4e-5
macbert-large 5 20 50 2e-5
PCL-MedBERT 5 32 128 4e-5

CHIP-CDN-cls

Hyper-parameters for the CHIP-CDN task. We model the CHIP-CDN task with two stages: recall stage and ranking stage. num_negative_sample sets the number of negative samples sampled for the training ranking model during the ranking stage. recall_k sets the number of candidates recalled in the recall stage.

Param Value
recall_k 200
num_negative_sample 5+5(random)

Hyper-parameters for the training of pre-trained models with a sequence classifier for the ranking model of the CHIP-CDN task. We encode the pairs of the original term and standard phrase from candidates recalled during the recall stage and then pass the pooled output to the classifier, which predicts the relevance between the original term and standard phrase.

Model epoch batch_size max_length learning_rate
bert-base 3 32 128 4e-5
bert-wwm-ext 3 32 128 5e-5
roberta-wwm-ext 3 32 128 4e-5
roberta-wwm-ext-large 3 32 40 4e-5
roberta-large 3 32 40 4e-5
albert-tiny 3 32 128 4e-5
albert-xxlarge 3 32 40 1e-5
zen 3 20 128 4e-5
macbert-base 3 32 128 4e-5
macbert-large 3 32 40 2e-5
PCL-MedBERT 3 32 128 4e-5

CHIP-CDN-num

Hyper-parameters for the training of pre-trained models with a sequence classifier for the prediction of the number of standard phrases corresponding to the original term in the CHIP-CDN task. We take the prediction results of the model as the number we choose from the most relevant standard phrases, combining with the prediction of the ranking model.

Model epoch batch_size max_length learning_rate
bert-base 20 32 128 4e-5
bert-wwm-ext 20 32 128 5e-5
roberta-wwm-ext 20 32 128 4e-5
roberta-wwm-ext-large 20 12 40 4e-5
roberta-large 20 12 40 4e-5
albert-tiny 20 32 128 4e-5
albert-xxlarge 20 12 40 1e-5
zen 20 20 128 4e-5
macbert-base 20 32 128 4e-5
macbert-large 20 12 40 2e-5
PCL-MedBERT 20 32 128 4e-5

CHIP-STS

Hyper-parameters for the training of pre-trained models with a sequence classifier for sentence similarity predication of the CHIP-STS task.

Model epoch batch_size max_length learning_rate
bert-base 3 16 40 3e-5
bert-wwm-ext 3 16 40 3e-5
roberta-wwm-ext 3 16 40 4e-5
roberta-wwm-ext-large 3 16 40 4e-5
roberta-large 3 16 40 2e-5
albert-tiny 3 16 40 5e-5
albert-xxlarge 3 16 40 1e-5
zen 3 16 40 2e-5
macbert-base 3 16 40 3e-5
macbert-large 3 16 40 3e-5
PCL-MedBERT 3 16 40 2e-5

KUAKE-QIC

Hyper-parameters for the training of pre-trained models with a sequence classifier for query intention prediction of the KUAKE-QIC task.

Model epoch batch_size max_length learning_rate
bert-base 3 16 50 2e-5
bert-wwm-ext 3 16 50 2e-5
roberta-wwm-ext 3 16 50 2e-5
roberta-wwm-ext-large 3 16 50 2e-5
roberta-large 3 16 50 3e-5
albert-tiny 3 16 50 5e-5
albert-xxlarge 3 16 50 1e-5
zen 3 16 50 2e-5
macbert-base 3 16 50 3e-5
macbert-large 3 16 50 2e-5
PCL-MedBERT 3 16 50 2e-5

KUAKE-QTR

Hyper-parameters for the training of pre-trained models with a sequence classifier for query-title pairs relevance prediction of the KUAKE-QTR task.

Model epoch batch_size max_length learning_rate
bert-base 3 16 40 4e-5
bert-wwm-ext 3 16 40 2e-5
roberta-wwm-ext 3 16 40 3e-5
roberta-wwm-ext-large 3 16 40 2e-5
roberta-large 3 16 40 2e-5
albert-tiny 3 16 40 5e-5
albert-xxlarge 3 16 40 1e-5
zen 3 16 40 3e-5
macbert-base 3 16 40 2e-5
macbert-large 3 16 40 2e-5
PCL-MedBERT 3 16 40 3e-5

KUAKE-QQR

Hyper-parameters for the training of pre-trained models with a sequence classifier for query-query pairs relevance prediction of the KUAKE-QQR task.

Model epoch batch_size max_length learning_rate
bert-base 3 16 30 3e-5
bert-wwm-ext 3 16 30 3e-5
roberta-wwm-ext 3 16 30 3e-5
roberta-wwm-ext-large 3 16 30 3e-5
roberta-large 3 16 30 2e-5
albert-tiny 3 16 30 5e-5
albert-xxlarge 3 16 30 3e-5
zen 3 16 30 2e-5
macbert-base 3 16 30 2e-5
macbert-large 3 16 30 2e-5
PCL-MedBERT 3 16 30 2e-5

How to Cite

@inproceedings{zhang-etal-2022-cblue,
    title = "{CBLUE}: A {C}hinese Biomedical Language Understanding Evaluation Benchmark",
    author = "Zhang, Ningyu  and
      Chen, Mosha  and
      Bi, Zhen  and
      Liang, Xiaozhuan  and
      Li, Lei  and
      Shang, Xin  and
      Yin, Kangping  and
      Tan, Chuanqi  and
      Xu, Jian  and
      Huang, Fei  and
      Si, Luo  and
      Ni, Yuan  and
      Xie, Guotong  and
      Sui, Zhifang  and
      Chang, Baobao  and
      Zong, Hui  and
      Yuan, Zheng  and
      Li, Linfeng  and
      Yan, Jun  and
      Zan, Hongying  and
      Zhang, Kunli  and
      Tang, Buzhou  and
      Chen, Qingcai",
    booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = may,
    year = "2022",
    address = "Dublin, Ireland",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.acl-long.544",
    pages = "7888--7915",
    abstract = "Artificial Intelligence (AI), along with the recent progress in biomedical language understanding, is gradually offering great promise for medical practice. With the development of biomedical language understanding benchmarks, AI applications are widely used in the medical field. However, most benchmarks are limited to English, which makes it challenging to replicate many of the successes in English for other languages. To facilitate research in this direction, we collect real-world biomedical data and present the first Chinese Biomedical Language Understanding Evaluation (CBLUE) benchmark: a collection of natural language understanding tasks including named entity recognition, information extraction, clinical diagnosis normalization, single-sentence/sentence-pair classification, and an associated online platform for model evaluation, comparison, and analysis. To establish evaluation on these tasks, we report empirical results with the current 11 pre-trained Chinese models, and experimental results show that state-of-the-art neural models perform by far worse than the human ceiling.",
}

References

[1] CLUE: A Chinese Language Understanding Evaluation Benchmark [pdf] [git] [web]

[2] GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding [pdf] [web]

[3] SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems [pdf] [web]

Popular Benchmark Projects
Popular Chinese Projects
Popular Software Performance Categories
Related Searches

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Python
Dataset
Benchmark
Chinese
Corpus