A Deep Learning Based Knowledge Extraction Toolkit
for Knowledge Graph Construction
DeepKE is a knowledge extraction toolkit for knowledge graph construction supporting cnSchema****low-resource, document-level and multimodal scenarios for entity, relation and attribute extraction. We provide documents, Google Colab tutorials, online demo, paper, slides and poster for beginners.
Reading Materials:
Data-Efficient Knowledge Graph Construction, (Tutorial on CCKS 2022) [slides]
Efficient and Robust Knowledge Graph Construction (Tutorial on AACL-IJCNLP 2022) [slides]
Prompt Learning-related research works and toolkits for PLM-based KG Embedding Learning, Editing and Applications [Resources]
Reasoning with language model prompting [Survey][Paper-list]
Related Toolkit:
DoccanoMarkToolLabelStudioData Annotation Toolkits
LambdaKG: A library and benchmark for PLM-based KG embeddings
dockerfile
to create the enviroment automatically.pip install deepke
There is a demonstration of prediction. The GIF file is created by Terminalizer. Get the code.
DeepKE supports pip install deepke
.
Take the fully supervised relation extraction for example.
Step1 Download the basic code
git clone --depth 1 https://github.com/zjunlp/DeepKE.git
Step2 Create a virtual environment using Anaconda
and enter it.
conda create -n deepke python=3.8
conda activate deepke
Install DeepKE with source code (Recommended)
python setup.py install
python setup.py develop
Install DeepKE with pip
pip install deepke
Step3 Enter the task directory
cd DeepKE/example/re/standard
Step4 Download the dataset, or follow the annotation instructions to obtain data
wget 120.27.214.45/Data/re/standard/data.tar.gz
tar -xzvf data.tar.gz
Many types of data formats are supported,and details are in each part.
Step5 Training (Parameters for training can be changed in the conf
folder)
We support visual parameter tuning by using wandb.
python run.py
Step6 Prediction (Parameters for prediction can be changed in the conf
folder)
Modify the path of the trained model in predict.yaml
.The absolute path of the model needs to be usedsuch as xxx/checkpoints/2019-12-03_ 17-35-30/cnn_ epoch21.pth
.
python predict.py
python == 3.8
Named entity recognition seeks to locate and classify named entities mentioned in unstructured text into pre-defined categories such as person names, organizations, locations, organizations, etc.
The data is stored in .txt
files. Some instances as following (Users can label data based on the tools Doccano, MarkTool, or they can use the Weak Supervision with DeepKE to obtain data automatically):
Sentence | Person | Location | Organization |
---|---|---|---|
94934 | |||
, |
Read the detailed process in specific README
We provide the off-the-shelf model, DeepKE-cnSchema-NER, which will extract entities in cnSchema without training.
Step1 Enter DeepKE/example/ner/standard
. Download the dataset.
wget 120.27.214.45/Data/ner/standard/data.tar.gz
tar -xzvf data.tar.gz
Step2 Training
The dataset and parameters can be customized in the data
folder and conf
folder respectively.
python run.py
Step3 Prediction
python predict.py
Step1 Enter DeepKE/example/ner/few-shot
. Download the dataset.
wget 120.27.214.45/Data/ner/few_shot/data.tar.gz
tar -xzvf data.tar.gz
Step2 Training in the low-resouce setting
The directory where the model is loaded and saved and the configuration parameters can be cusomized in the conf
folder.
python run.py +train=few_shot
Users can modify load_path
in conf/train/few_shot.yaml
to use existing loaded model.
Step3 Add - predict
to conf/config.yaml
, modify loda_path
as the model path and write_path
as the path where the predicted results are saved in conf/predict.yaml
, and then run python predict.py
python predict.py
Step1 Enter DeepKE/example/ner/multimodal
. Download the dataset.
wget 120.27.214.45/Data/ner/multimodal/data.tar.gz
tar -xzvf data.tar.gz
We use RCNN detected objects and visual grounding objects from original images as visual local information, where RCNN via faster_rcnn and visual grounding via onestage_grounding.
Step2 Training in the multimodal setting
data
folder and conf
folder respectively.load_path
in conf/train.yaml
as the path where the model trained last time was saved. And the path saving logs generated in training can be customized by log_dir
.python run.py
Step3 Prediction
python predict.py
Relationship extraction is the task of extracting semantic relations between entities from a unstructured text.
The data is stored in .csv
files. Some instances as following (Users can label data based on the tools Doccano, MarkTool, or they can use the Weak Supervision with DeepKE to obtain data automatically):
Sentence | Relation | Head | Head_offset | Tail | Tail_offset |
---|---|---|---|---|---|
1 | 8 | ||||
1 | 7 | ||||
8 | 2 |
!NOTE: If there are multiple entity types for one relation, entity types can be prefixed with the relation as inputs.
Read the detailed process in specific README
We provide the off-the-shelf model, DeepKE-cnSchema-RE, which will extract relations in cnSchema without training.
Step1 Enter the DeepKE/example/re/standard
folder. Download the dataset.
wget 120.27.214.45/Data/re/standard/data.tar.gz
tar -xzvf data.tar.gz
Step2 Training
The dataset and parameters can be customized in the data
folder and conf
folder respectively.
python run.py
Step3 Prediction
python predict.py
Step1 Enter DeepKE/example/re/few-shot
. Download the dataset.
wget 120.27.214.45/Data/re/few_shot/data.tar.gz
tar -xzvf data.tar.gz
Step 2 Training
data
folder and conf
folder respectively.train_from_saved_model
in conf/train.yaml
as the path where the model trained last time was saved. And the path saving logs generated in training can be customized by log_dir
.python run.py
Step3 Prediction
python predict.py
Step1 Enter DeepKE/example/re/document
. Download the dataset.
wget 120.27.214.45/Data/re/document/data.tar.gz
tar -xzvf data.tar.gz
Step2 Training
data
folder and conf
folder respectively.train_from_saved_model
in conf/train.yaml
as the path where the model trained last time was saved. And the path saving logs generated in training can be customized by log_dir
.python run.py
Step3 Prediction
python predict.py
Step1 Enter DeepKE/example/re/multimodal
. Download the dataset.
wget 120.27.214.45/Data/re/multimodal/data.tar.gz
tar -xzvf data.tar.gz
We use RCNN detected objects and visual grounding objects from original images as visual local information, where RCNN via faster_rcnn and visual grounding via onestage_grounding.
Step2 Training
data
folder and conf
folder respectively.load_path
in conf/train.yaml
as the path where the model trained last time was saved. And the path saving logs generated in training can be customized by log_dir
.python run.py
Step3 Prediction
python predict.py
Attribute extraction is to extract attributes for entities in a unstructed text.
The data is stored in .csv
files. Some instances as following:
Sentence | Att | Ent | Ent_offset | Val | Val_offset |
---|---|---|---|---|---|
19682 | 0 | 6 | |||
0 | 8 | ||||
2014101 | 19 | 2014101 | 0 |
Read the detailed process in specific README
Step1 Enter the DeepKE/example/ae/standard
folder. Download the dataset.
wget 120.27.214.45/Data/ae/standard/data.tar.gz
tar -xzvf data.tar.gz
Step2 Training
The dataset and parameters can be customized in the data
folder and conf
folder respectively.
python run.py
Step3 Prediction
python predict.py
This toolkit provides many Jupyter Notebook
and Google Colab
tutorials. Users can study DeepKE with them.
Standard Setting
Low-resource
Document-level
Multimodal
1.Using nearest mirror
, THU in China, will speed up the installation of Anaconda; aliyun in China, will speed up pip install XXX
.
2.When encountering ModuleNotFoundError: No module named 'past'
run pip install future
.
3.It's slow to install the pretrained language models online. Recommend download pretrained models before use and save them in the pretrained
folder. Read README.md
in every task directory to check the specific requirement for saving pretrained models.
4.The old version of DeepKE is in the deepke-v1.0 branch. Users can change the branch to use the old version. The old version has been totally transfered to the standard relation extraction (example/re/standard).
5.It's recommended to install DeepKE with source codes. Because user may meet some problems in Windows system with 'pip',and the source code modification will not work,seeissue
6.More related low-resource knowledge extraction works can be found in Knowledge Extraction in Low-Resource Scenarios: Survey and Perspective.
7.Make sure the exact versions of requirements in requirements.txt
.
In next version, we plan to add event extraction to the toolkit.
Meanwhile, we will offer long-term maintenance to fix bugs, solve issues and meet new requests. So if you have any problems, please put issues to us.
Please cite our paper if you use DeepKE in your work
@article{zhang2022deepke,
title={DeepKE: A Deep Learning Based Knowledge Extraction Toolkit for Knowledge Base Population},
author={Zhang, Ningyu and Xu, Xin and Tao, Liankuan and Yu, Haiyang and Ye, Hongbin and Qiao, Shuofei and Xie, Xin and Chen, Xiang and Li, Zhoubo and Li, Lei and others},
journal={arXiv preprint arXiv:2201.03335},
year={2022}
}
Zhejiang University: Ningyu Zhang, Liankuan Tao, Xin Xu, Haiyang Yu, Hongbin Ye, Shuofei Qiao, Peng Wang, Xin Xie, Xiang Chen, Zhoubo Li, Lei Li, Xiaozhuan Liang, Yunzhi Yao, Shumin Deng, Wen Zhang, Guozhou Zheng, Huajun Chen
Community Contributors: thredreams, eltociear
Alibaba Group: Feiyu Xiong, Qiang Chen
DAMO Academy: Zhenru Zhang, Chuanqi Tan, Fei Huang