Awesome Open Source
Awesome Open Source


Implementation of a seq2seq model for speech recognition. Architecture similar to "Listen, Attend and Spell".

alt text

Created: ['S', 'E', 'V', 'E', 'N', 'T', 'E', 'E', 'N', '<SPACE>', 'T', 'W', 'E', 'N', 'T', 'Y', '<SPACE>', 'F', 'O', 'U', 'R']
Actual: ['S', 'E', 'V', 'E', 'N', 'T', 'E', 'E', 'N', '<SPACE>', 'T', 'W', 'E', 'N', 'T', 'Y', '<SPACE>', 'F', 'O', 'U', 'R']


  • Tensorflow
  • numpy
  • pandas
  • librosa
  • python_speech_features


The dataset I used is the LibriSpeech dataset. It contains about 1000 hours of 16kHz read English speech. It is available here:


I uploaded three .py files and one .ipynb file. The .py files contain the network implementation and utilities. The Jupyter Notebook is a demo of how to apply the model.


Seq2Seq model
As I mentioned above the model architecture is similar to the one used in "Listen, Attend and Spell", i.e. we are using pyramidal bidirectional LSTMs in the encoder. This reduces the time resolution and enhances the performance on longer sequences.

  • Encoder-Decoder
  • Pyramidal Bidirectional LSTM
  • Bahdanau Attention
  • Adam Optimizer
  • exponential or cyclic learning rate
  • Beam Search or Greedy Decoding

Get A Weekly Email With Trending Projects For These Topics
No Spam. Unsubscribe easily at any time.
python (53,705
jupyter-notebook (6,203
deep-learning (3,923
machine-learning (3,586
tensorflow (2,141
nlp (1,079
speech-recognition (201
seq2seq (103
speech-to-text (93
encoder-decoder (39
sequence-to-sequence (30