Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Deepwalk | 2,513 | 7 | 5 months ago | 4 | April 29, 2018 | 42 | other | Python | ||
DeepWalk - Deep Learning for Graphs | ||||||||||
Awesome Network Embedding | 2,218 | 2 years ago | 3 | |||||||
A curated list of network embedding techniques. | ||||||||||
Awesome Knowledge Graph | 2,122 | 2 years ago | 4 | |||||||
整理知识图谱相关学习资料 | ||||||||||
Ampligraph | 1,908 | 3 months ago | 12 | May 25, 2021 | 32 | apache-2.0 | Python | |||
Python library for Representation Learning on Knowledge Graphs https://docs.ampligraph.org | ||||||||||
Capsgnn | 1,180 | 2 months ago | 3 | gpl-3.0 | Python | |||||
A PyTorch implementation of "Capsule Graph Neural Network" (ICLR 2019). | ||||||||||
Dgl Ke | 1,091 | 2 months ago | 6 | January 25, 2021 | 55 | apache-2.0 | Python | |||
High performance, easy-to-use, and scalable package for learning large-scale knowledge graph embeddings. | ||||||||||
Word2vec Graph | 650 | 2 years ago | 1 | Python | ||||||
Exploring word2vec embeddings as a graph of nearest neighbors | ||||||||||
Conve | 574 | 7 months ago | 22 | mit | Python | |||||
Convolutional 2D Knowledge Graph Embeddings resources | ||||||||||
Compgcn | 519 | 3 months ago | 11 | apache-2.0 | Python | |||||
ICLR 2020: Composition-Based Multi-Relational Graph Convolutional Networks | ||||||||||
Cleora | 434 | a month ago | 12 | other | Jupyter Notebook | |||||
Cleora AI is a general-purpose model for efficient, scalable learning of stable and inductive entity embeddings for heterogeneous relational data. |
Convolutional 2D Knowledge Graph Embeddings resources.
Paper: Convolutional 2D Knowledge Graph Embeddings
Used in the paper, but do not use these datasets for your research: FB15k and WN18. Please also note that the Kinship and Nations datasets have a high number of inverse relationships which makes them unsuitable for research. Nations has +95% inverse relationships and Kinship about 48%.
Dataset | MR | MRR | [email protected] | [email protected] | [email protected] |
---|---|---|---|---|---|
FB15k | 64 | 0.75 | 0.87 | 0.80 | 0.67 |
WN18 | 504 | 0.94 | 0.96 | 0.95 | 0.94 |
FB15k-237 | 246 | 0.32 | 0.49 | 0.35 | 0.24 |
WN18RR | 4766 | 0.43 | 0.51 | 0.44 | 0.39 |
YAGO3-10 | 2792 | 0.52 | 0.66 | 0.56 | 0.45 |
Nations | 2 | 0.82 | 1.00 | 0.88 | 0.72 |
UMLS | 1 | 0.94 | 0.99 | 0.97 | 0.92 |
Kinship | 2 | 0.83 | 0.98 | 0.91 | 0.73 |
For an embedding size of 200 and batch size 128, a single batch takes on a GTX Titan X (Maxwell):
Parameters | ConvE/DistMult MRR | ConvE/DistMult [email protected] | ConvE/DistMult [email protected] |
---|---|---|---|
~5.0M | 0.32 / 0.24 | 0.49 / 0.42 | 0.24 / 0.16 |
1.89M | 0.32 / 0.23 | 0.49 / 0.41 | 0.23 / 0.15 |
0.95M | 0.30 / 0.22 | 0.46 / 0.39 | 0.22 / 0.14 |
0.24M | 0.26 / 0.16 | 0.39 / 0.31 | 0.19 / 0.09 |
ConvE with 8 times less parameters is still more powerful than DistMult. Relational Graph Convolutional Networks use roughly 32x more parameters to have the same performance as ConvE.
This repo supports Linux and Python installation via Anaconda.
pip install -r requirements.txt
python -m spacy download en_core_web_sm
sh preprocess.sh
Parameters need to be specified by white-space tuples for example:
CUDA_VISIBLE_DEVICES=0 python main.py --model conve --data FB15k-237 \
--input-drop 0.2 --hidden-drop 0.3 --feat-drop 0.2 \
--lr 0.003 --preprocess
will run a ConvE model on FB15k-237.
To run a model, you first need to preprocess the data once. This can be done by specifying the --preprocess
parameter:
CUDA_VISIBLE_DEVICES=0 python main.py --data DATASET_NAME --preprocess
After the dataset is preprocessed it will be saved to disk and this parameter can be omitted.
CUDA_VISIBLE_DEVICES=0 python main.py --data DATASET_NAME
The following parameters can be used for the --model
parameter:
conve
distmult
complex
The following datasets can be used for the --data
parameter:
FB15k-237
WN18RR
YAGO3-10
umls
kinship
nations
And here a complete list of parameters.
Link prediction for knowledge graphs
optional arguments:
-h, --help show this help message and exit
--batch-size BATCH_SIZE
input batch size for training (default: 128)
--test-batch-size TEST_BATCH_SIZE
input batch size for testing/validation (default: 128)
--epochs EPOCHS number of epochs to train (default: 1000)
--lr LR learning rate (default: 0.003)
--seed S random seed (default: 17)
--log-interval LOG_INTERVAL
how many batches to wait before logging training
status
--data DATA Dataset to use: {FB15k-237, YAGO3-10, WN18RR, umls,
nations, kinship}, default: FB15k-237
--l2 L2 Weight decay value to use in the optimizer. Default:
0.0
--model MODEL Choose from: {conve, distmult, complex}
--embedding-dim EMBEDDING_DIM
The embedding dimension (1D). Default: 200
--embedding-shape1 EMBEDDING_SHAPE1
The first dimension of the reshaped 2D embedding. The
second dimension is infered. Default: 20
--hidden-drop HIDDEN_DROP
Dropout for the hidden layer. Default: 0.3.
--input-drop INPUT_DROP
Dropout for the input embeddings. Default: 0.2.
--feat-drop FEAT_DROP
Dropout for the convolutional features. Default: 0.2.
--lr-decay LR_DECAY Decay the learning rate by this factor every epoch.
Default: 0.995
--loader-threads LOADER_THREADS
How many loader threads to use for the batch loaders.
Default: 4
--preprocess Preprocess the dataset. Needs to be executed only
once. Default: 4
--resume Resume a model.
--use-bias Use a bias in the convolutional layer. Default: True
--label-smoothing LABEL_SMOOTHING
Label smoothing value to use. Default: 0.1
--hidden-size HIDDEN_SIZE
The side of the hidden layer. The required size
changes with the size of the embeddings. Default: 9728
(embedding size 200).
To reproduce most of the results in the ConvE paper, you can use the default parameters and execute the command below:
CUDA_VISIBLE_DEVICES=0 python main.py --data DATASET_NAME
For the reverse model, you can run the provided file with the name of the dataset name and a threshold probability:
python inverse_model.py WN18RR 0.9
If you want to change the embedding size you can do that via the ``--embedding-dimparameter. However, for ConvE, since the embedding is reshaped as a 2D embedding one also needs to pass the first dimension of the reshaped embedding (
--embedding-shape1`) while the second dimension is infered. When once changes the embedding size, the hidden layer size `--hidden-size` also needs to be different but it is difficult to determine before run time. The easiest way to determine the hidden size is to run the model, let it run on an error due to wrong shape, and then reshape according to the dimension in the error message.
Example: Change embedding size to be 100. We want 10x10 2D embeddings. We run python main.py --embedding-dim 100 --embedding-shape1 10
and we run on an error due to wrong hidden dimension:
ret = torch.addmm(bias, input, weight.t())
RuntimeError: size mismatch, m1: [128 x 4608], m2: [9728 x 100] at /opt/conda/conda-bld/pytorch_1565272271120/work/aten/src/THC/generic/THCTensorMathBlas.cu:273
Now we change the hidden dimension to 4608 accordingly: python main.py --embedding-dim 100 --embedding-shape1 10 --hidden-size 4608
. Now the model runs with an embedding size of 100 and 10x10 2D embeddings.
To run it on a new datasets, copy your dataset folder into the data folder and make sure your dataset split files have the name train.txt
, valid.txt
, and test.txt
which contain tab separated triples of a knowledge graph. Then execute python wrangle_KG.py FOLDER_NAME
, afterwards, you can use the folder name of your dataset in the dataset parameter.
You can easily write your own knowledge graph model by extending the barebone model MyModel
that can be found in the model.py
file.
There are some quirks of this framework.
--resume True
variable) and then evaluate with a batch size that fits the test data (for 220 you could use a batch size of 110). Another solution is to just use a fitting batch size from the start, that is, you could train with a batch size of 110.It has been noted that #6 WN18RR does contain 212 entities in the test set that do not appear in the training set. About 6.7% of the test set is affected. This means that most models will find it impossible to make any reasonable predictions for these entities. This will make WN18RR appear more difficult than it really is, but it should not affect the usefulness of the dataset. If all researchers compared to the same datasets the scores will still be comparable.
Some log files of the original research are included in the repo (logs.tar.gz). These log files are mostly unstructured in names and might be created from checkpoints so that it is difficult to comprehend them. Nevertheless, it might help to replicate the results or study the behavior of the training under certain conditions and thus I included them here.
If you found this codebase or our work useful please cite us:
@inproceedings{dettmers2018conve,
Author = {Dettmers, Tim and Pasquale, Minervini and Pontus, Stenetorp and Riedel, Sebastian},
Booktitle = {Proceedings of the 32th AAAI Conference on Artificial Intelligence},
Title = {Convolutional 2D Knowledge Graph Embeddings},
Url = {https://arxiv.org/abs/1707.01476},
Year = {2018},
pages = {1811--1818},
Month = {February}
}