The Transformer model in Attention is all you need:a Keras implementation.
A Keras+TensorFlow Implementation of the Transformer: "Attention is All You Need" (Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, arxiv, 2017)
Usage
Please refer to en2de_main.py and pinyin_main.py
en2de_main.py
Results
- The code achieves near results as in the repository: about 70% valid accuracy.
If using smaller model parameters, such as layers=2 and d_model=256, the valid accuracy is better since the task is quite small.
For your own data
- Just preproess your source and target sequences as the format in en2de.s2s.txt and pinyin.corpus.examples.txt.
Some notes
- For larger number of layers, the special learning rate scheduler reported in the papar is necessary.
- In pinyin_main.py, I tried another method to train the deep network. I train the first layer and the embedding layer first, then train a 2-layers model, and then train a 3-layers, etc. It works in this task.
Upgrades
- Reconstruct some classes.
- It is more easier to use the components in other models, just import transformer.py
- A fast step-by-step decoder is added, including an upgraded beam-search. But they should be modified to be reuseable.
Acknowledgement