Awesome Open Source
Awesome Open Source

End-to-End Speech Recognition using RNN-Transducer

File description

  • rnnt joint model decode
  • rnnt model, which contains acoustic / phoneme model
  • rnnt model refer to Graves2012
  • seq2seq/*: seq2seq with attention
  • rnnt loss function implementation on mxnet, support for both symbol and gluon refer to PyTorch implementation
  • data process
  • rnnt training script, can be initialized from CTC and PM model
  • ctc training script
  • attention training script

Directory description

  • conf: kaldi feature extraction config

Reference Paper


  • Compile RNNT Loss Follow the instructions in here to compile MXNET with RNNT loss.

  • Extract feature link kaldi timit example dirs (local steps utils ) excute to extract 40 dim fbank feature run to get 123 dim feature as described in Graves2013

  • Train RNNT model:

python --lr 1e-3 --bi --dropout .5 --out exp/rnnt_bi_lr1e-3 --schedule


Default only for RNNT

  • Greedy decoding:
python <path to best model parameters> --bi
  • Beam search:
python <path to best model parameters> --bi --beam <beam size>


  • CTC

    Decode PER
    greedy 20.36
    beam 100 20.03
  • Transducer

    Decode PER
    greedy 20.74
    beam 40 19.84


  • Python 3.6
  • MxNet 1.1.0
  • numpy 1.14


  • beam serach accelaration
  • Seq2Seq with attention

Get A Weekly Email With Trending Projects For These Topics
No Spam. Unsubscribe easily at any time.
python (55,471
speech-recognition (209
mxnet (110
asr (61
end-to-end (35