Our implementation is largely based on Tensorflow implementation
I'm a freshman of pytorch. So I tried to implement some projects by pytorch. Recently, I read the paper Attention is all you need and impressed by the idea. So that's it. I got similar result compared with the original tensorflow implementation.
I don't intend to replicate the paper exactly. Rather, I aim to implement the main ideas in the paper and verify them in a SIMPLE and QUICK way. In this respect, some parts in my code are different than those in the paper. Among them are
hyperparams.pyincludes all hyper parameters that are needed.
prepro.pycreates vocabulary files for the source and the target.
data_load.pycontains functions regarding loading and batching data.
modules.pyhas all building blocks for encoder/decoder networks.
train.pyhas the model.
eval.pyis for evaluation.
wget -qO- https://wit3.fbk.eu/archive/2016-01//texts/de/en/de-en.tgz | tar xz; mv de-en corpora
prepro.pyto generate vocabulary files to the
train.pyor download pretrained weights, put it into folder './models/' and change the
tensorboard --logdir runs
I got a BLEU score of 16.7.(tensorflow implementation 17.14) (Recollect I trained with a small dataset, limited vocabulary) Some of the evaluation results are as follows. Details are available in the
source: Ich bin nicht sicher was ich antworten soll
expected: I'm not really sure about the answer
got: I'm not sure what I'm going to answer
source: Was macht den Unterschied aus
expected: What makes his story different
got: What makes a difference
source: Vielen Dank
expected: Thank you
got: Thank you
source: Das ist ein Baum
expected: This is a tree
got: So this is a tree