Awesome Open Source
Awesome Open Source

Text-Classification-Models-Pytorch

Implementation of State-of-the-art Text Classification Models in Pytorch

Implemented Models

Requirements

  • Python-3.5.0
  • Pandas-0.23.4
  • Numpy-1.15.2
  • Spacy-2.0.13
  • Pytorch-0.4.1.post2
  • Torchtext-0.3.1

Usage

  1. Download data into "data/" directory or use already available data
  2. If using your own data, convert it into the same format as of provided data
  3. Download Pre-trained word embeddings (Glove/Word2Vec) into "data/" directory
  4. Go to corresponding model directory
  5. run following command:

python train.py <path_to_training_file> <path_to_test_file>

Model Performance

  • All the models were run on a 14GB machine with 2 Cores and one NVIDIA Tesla K80 GPU.
  • Runtime in the table below includes the time to load and process the data and running the model.
  • Model parameters are not tuned. So, better performance can be achieved by some parameter tuning.
Model Dataset
AG_News Query_Well_formedness
Accuracy (%) Runtime Accuracy (%) Runtime
fastText 89.46 16.0 Mins 62.10 7.0 Mins
TextCNN 88.57 17.2 Mins 67.38 7.43 Mins
TextRNN 88.07 (Seq len = 20)
90.43 (Flexible seq len)
21.5 Mins
36.8 Mins
68.29
66.29
7.69 Mins
7.25 Mins
RCNN 90.61 22.73 Mins 66.70 7.21 Mins
CharCNN 87.70 13.08 Mins 68.83 2.49 Mins
Seq2Seq_Attention 90.26 19.10 Mins 67.84 7.36 Mins
Transformer 88.54 46.47 Mins 63.43 5.77 Mins

References

[1] Bag of Tricks for Efficient Text Classification [2] Convolutional Neural Networks for Sentence Classification [3] Recurrent Convolutional Neural Networks for Text Classification [4] Character-level Convolutional Networks for Text Classification [5] Neural Machine Translation by Jointly Learning to Align and Translate [6] Text Classification Research with Attention-based Recurrent Neural Networks [7] Attention Is All You Need [8] Rethinking the Inception Architecture for Computer Vision [9] Identifying Well-formed Natural Language Questions


Get A Weekly Email With Trending Projects For These Topics
No Spam. Unsubscribe easily at any time.
python (54,487
deep-learning (3,987
pytorch (2,381
nlp (1,097
convolutional-neural-networks (456
classification (282
transformer (200
recurrent-neural-networks (147
attention (112
seq2seq (107
fasttext (42
rcnn (16