Awesome Open Source
Awesome Open Source

TensorFlow Implementation of End-to-End Speech Recognition

Requirements

  • TensorFlow >= 1.3.0
  • tqdm >= 4.14.0
  • python-Levenshtein >= 0.12.0
  • setproctitle >= 1.1.10
  • seaborn >= 0.7.1

Corpus

TIMIT

  • Phone (39, 48, 61 phones)
  • character

LibriSpeech

  • Phone (under implementation)
  • Character
  • Word

CSJ (Corpus of Spontaneous Japanese)

  • Phone (under implementation)
  • Japanese kana character (about 150 classes)
  • Japanese kanji characters (about 3000 classes)

These corpuses will be added in the future.

  • Switchboard
  • WSJ
  • AMI

This repository does'nt include pre-processing and pre-processing is based on this repo. If you want to do pre-processing, please look at this repo.

Model

Encoder

  • BLSTM
  • LSTM
  • BGRU
  • GRU
  • VGG-BLSTM
  • VGG-LSTM
  • Multi-task BLSTM
    • you can set another CTC layer to the aubitrary layer.
  • Multi-task LSTM
  • VGG

Connectionist Temporal Classification (CTC) [Graves+ 2006]

  • Greedy decoder
  • Beam Search decoder
  • Beam Search decoder w/ CharLM (under implementation)
Options
  • Frame-stacking [Sak+ 2015]
  • Multi-GPUs training (synchronous)
  • Splicing
  • Down sampling (under implementation)

Attention Mechanism

Decoder
  • Greedy decoder
  • Beam search decoder (under implementation)
Attention type
  • Bahdanau's content-based attention
  • Bahdanau's normed content-based attention (under implementation)
  • location-based attention
  • Hybrid attention
  • Luong's dot attention
  • Luong's scaled dot attention (under implementation)
  • Luong's general attention
  • Luong's concat attention
  • Baidu's attention (under implementation)
Options
  • Sharpning
  • Temperature regularization in the softmax layer (Output posteriors)
  • Joint CTC-Attention [Kim 2016]
  • Coverage (under implementation)

Usage

Please refer to docs in each corpuse

  • TIMIT
  • LibriSpeech
  • CSJ

Lisense

MIT

Contact

[email protected]


Get A Weekly Email With Trending Projects For These Topics
No Spam. Unsubscribe easily at any time.
python (51,962
tensorflow (2,131
speech-recognition (197
attention-mechanism (125
speech-to-text (92
asr (56
end-to-end (32
ctc (27
beam-search (20
automatic-speech-recognition (17

Find Open Source By Browsing 7,000 Topics Across 59 Categories