Awesome Open Source
Awesome Open Source

The Transformer model in Attention is all you need:a Keras implementation.

A Keras+TensorFlow Implementation of the Transformer: "Attention is All You Need" (Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, arxiv, 2017)


Please refer to and


  • The code achieves near results as in the repository: about 70% valid accuracy. If using smaller model parameters, such as layers=2 and d_model=256, the valid accuracy is better since the task is quite small.

For your own data

  • Just preproess your source and target sequences as the format in en2de.s2s.txt and pinyin.corpus.examples.txt.

Some notes

  • For larger number of layers, the special learning rate scheduler reported in the papar is necessary.
  • In, I tried another method to train the deep network. I train the first layer and the embedding layer first, then train a 2-layers model, and then train a 3-layers, etc. It works in this task.


  • Reconstruct some classes.
  • It is more easier to use the components in other models, just import
  • A fast step-by-step decoder is added, including an upgraded beam-search. But they should be modified to be reuseable.


Get A Weekly Email With Trending Projects For These Topics
No Spam. Unsubscribe easily at any time.
python (51,962
deep-learning (3,853
keras (755
keras-tensorflow (68
attention-is-all-you-need (19

Find Open Source By Browsing 7,000 Topics Across 59 Categories