Awesome Open Source
Awesome Open Source

TF-seq2seq

Sequence to sequence (seq2seq) learning Using TensorFlow.

The core building blocks are RNN Encoder-Decoder architectures and Attention mechanism.

The package was largely implemented using the latest (1.2) tf.contrib.seq2seq modules

  • AttentionWrapper
  • Decoder
  • BasicDecoder
  • BeamSearchDecoder

The package supports

  • Multi-layer GRU/LSTM
  • Residual connection
  • Dropout
  • Attention and input_feeding
  • Beamsearch decoding
  • Write n-best list

Dependencies

  • NumPy >= 1.11.1
  • Tensorflow >= 1.2

History

  • June 5, 2017: Major update
  • June 6, 2017: Supports batch beamsearch decoding
  • June 11, 2017: Separted training / decoding
  • June 22, 2017: Supports tf.1.2 (contrib.rnn -> python.ops.rnn_cell)

Usage Instructions

Data Preparation

To preprocess raw parallel data of sample_data.src and sample_data.trg, simply run

cd data/
./preprocess.sh src trg sample_data ${max_seq_len}

Running the above code performs widely used preprocessing steps for Machine Translation (MT).

  • Normalizing punctuation
  • Tokenizing
  • Bytepair encoding (# merge = 30000) (Sennrich et al., 2016)
  • Cleaning sequences of length over ${max_seq_len}
  • Shuffling
  • Building dictionaries

Training

To train a seq2seq model,

$ python train.py   --cell_type 'lstm' \ 
                    --attention_type 'luong' \
                    --hidden_units 1024 \
                    --depth 2 \
                    --embedding_size 500 \
                    --num_encoder_symbols 30000 \
                    --num_decoder_symbols 30000 ...

Decoding

To run the trained model for decoding,

$ python decode.py  --beam_width 5 \
                    --decode_batch_size 30 \
                    --model_path $PATH_TO_A_MODEL_CHECKPOINT (e.g. model/translate.ckpt-100) \
                    --max_decode_step 300 \
                    --write_n_best False
                    --decode_input $PATH_TO_DECODE_INPUT
                    --decode_output $PATH_TO_DECODE_OUTPUT
                    

If --beam_width=1, greedy decoding is performed at each time-step.

Arguments

Data params

  • --source_vocabulary : Path to source vocabulary
  • --target_vocabulary : Path to target vocabulary
  • --source_train_data : Path to source training data
  • --target_train_data : Path to target training data
  • --source_valid_data : Path to source validation data
  • --target_valid_data : Path to target validation data

Network params

  • --cell_type : RNN cell to use for encoder and decoder (default: lstm)
  • --attention_type : Attention mechanism (bahdanau, luong), (default: bahdanau)
  • --depth : Number of hidden units for each layer in the model (default: 2)
  • --embedding_size : Embedding dimensions of encoder and decoder inputs (default: 500)
  • --num_encoder_symbols : Source vocabulary size to use (default: 30000)
  • --num_decoder_symbols : Target vocabulary size to use (default: 30000)
  • --use_residual : Use residual connection between layers (default: True)
  • --attn_input_feeding : Use input feeding method in attentional decoder (Luong et al., 2015) (default: True)
  • --use_dropout : Use dropout in rnn cell output (default: True)
  • --dropout_rate : Dropout probability for cell outputs (0.0: no dropout) (default: 0.3)

Training params

  • --learning_rate : Number of hidden units for each layer in the model (default: 0.0002)
  • --max_gradient_norm : Clip gradients to this norm (default 1.0)
  • --batch_size : Batch size
  • --max_epochs : Maximum training epochs
  • --max_load_batches : Maximum number of batches to prefetch at one time.
  • --max_seq_length : Maximum sequence length
  • --display_freq : Display training status every this iteration
  • --save_freq : Save model checkpoint every this iteration
  • --valid_freq : Evaluate the model every this iteration: valid_data needed
  • --optimizer : Optimizer for training: (adadelta, adam, rmsprop) (default: adam)
  • --model_dir : Path to save model checkpoints
  • --model_name : File name used for model checkpoints
  • --shuffle_each_epoch : Shuffle training dataset for each epoch (default: True)
  • --sort_by_length : Sort pre-fetched minibatches by their target sequence lengths (default: True)

Decoding params

  • --beam_width : Beam width used in beamsearch (default: 1)
  • --decode_batch_size : Batch size used in decoding
  • --max_decode_step : Maximum time step limit in decoding (default: 500)
  • --write_n_best : Write beamsearch n-best list (n=beam_width) (default: False)
  • --decode_input : Input file path to decode
  • --decode_output : Output file path of decoding output

Runtime params

  • --allow_soft_placement : Allow device soft placement
  • --log_device_placement : Log placement of ops on devices

Acknowledgements

The implementation is based on following projects:

  • nematus: Theano implementation of Neural Machine Translation. Major reference of this project
  • subword-nmt: Included subword-unit scripts to preprocess input data
  • moses: Included preprocessing scripts to preprocess input data
  • tf.seq2seq_legacy Legacy Tensorflow seq2seq tutorial
  • tf_tutorial_plus: Nice tutorials for tf.contrib.seq2seq API

For any comments and feedbacks, please email me at [email protected] or open an issue here.

Alternatives To Tf Seq2seq
Select To Compare


Alternative Project Comparisons
Related Awesome Lists
Top Programming Languages

Get A Weekly Email With Trending Projects For These Topics
No Spam. Unsubscribe easily at any time.
Machine Learning (40,908
Deep Learning (39,317
Tensorflow (22,894
Neural Network (16,112
Natural Language Processing (15,892
Decoder (4,506
Decoding (3,351
Sequence To Sequence (1,367
Nmt (522
Encoder Decoder (458
Natural Language Understanding (397
Neural Machine Translation (224
Beam Search (170