Awesome Open Source
Awesome Open Source


This repository contains PyTorch implementations of sequence to sequence models for machine translation. The code is based on fairseq and purportedly made simple for the sake of readability, although main features such as multi-GPU training and beam search remain intact.

Two encoder-decoder models are implemented in this repository: a classic model based on LSTM networks with attention mechanism (Bahdanau et al.) and Transformer, a recently favored model built entirely from self-attention (Vaswani et al.).



The code was written for Python 3.6 or higher, and it has been tested with PyTorch 0.4.1. Training is only available with GPU. To get started, try to clone the repository

git clone
cd machine-translation


To download the IWSLT'14 DE-EN dataset and perform tokenization, it might be easier to just run:


Then, the following commands help build dictionaries and map tokens into indices:

python --source-lang de --target-lang en --train-prefix $DATA_PATH/train --valid-prefix $DATA_PATH/valid --test-prefix $DATA_PATH/test --dest-dir data-bin/


To get started with training a model on SQuAD, you might find the following commands helpful:

python --data data-bin/ --source-lang de --target-lang en --lr 0.25 --clip-norm 0.1 --max-tokens 12000 --save-dir checkpoints/transformer


When the training is done, you can make predictions and compute BLEU scores:

python --data data-bin/ --checkpoint-path checkpoints/transformer/ > /tmp/lstm.out
grep ^H /tmp/lstm.out | cut -f2- | sed -r 's/'$(echo -e "\033")'\[[0-9]{1,2}(;([0-9]{1,2})?)?[mK]//g' > /tmp/transformer.sys
grep ^T /tmp/lstm.out | cut -f2- | sed -r 's/'$(echo -e "\033")'\[[0-9]{1,2}(;([0-9]{1,2})?)?[mK]//g' > /tmp/transformer.ref
python --reference /tmp/transformer.ref --system /tmp/transformer.sys

Get A Weekly Email With Trending Projects For These Topics
No Spam. Unsubscribe easily at any time.
python (51,962
pytorch (2,279
transformer (175
seq2seq (102
machine-translation (64
sequence-to-sequence (30
attention-is-all-you-need (19

Find Open Source By Browsing 7,000 Topics Across 59 Categories