Awesome Open Source
Awesome Open Source

Text Classification Benchmark

A Benchmark of Text Classification in PyTorch

Motivation

We are trying to build a Benchmark for Text Classification including

Many Text Classification DataSet, including Sentiment/Topic Classfication, popular language(e.g. English and Chinese). Meanwhile, a basic word embedding is provided.

Implment many popular and state-of-art Models, especially in deep neural network.

Have done

We have done some dataset and models

Dataset done

  • IMDB
  • SST
  • Trec

Models done

  • FastText
  • BasicCNN (KimCNN,MultiLayerCNN, Multi-perspective CNN)
  • InceptionCNN
  • LSTM (BILSTM, StackLSTM)
  • LSTM with Attention (Self Attention / Quantum Attention)
  • Hybrids between CNN and RNN (RCNN, C-LSTM)
  • Transformer - Attention is all you need
  • ConS2S
  • Capsule
  • Quantum-inspired NN

Libary

You should have install these librarys

python3
torch
torchtext (optional)

Dataset

Dataset will be automatically configured in current path, or download manually your data in Dataset, step-by step.

including

Glove embeding
Sentiment classfication dataset IMDB

usage

Run in default setting

python main.py

CNN

python main.py --model cnn

LSTM

python main.py --model lstm

Road Map

  • [X] Data preprossing framework
  • [X] Models modules
  • [ ] Loss, Estimator and hyper-paramter tuning.
  • [ ] Test modules
  • [ ] More Dataset
  • [ ] More models

Organisation of the repository

The core of this repository is models and dataset.

  • dataloader/: loading all dataset such as IMDB, SST

  • models/: creating all models such as FastText, LSTM,CNN,Capsule,QuantumCNN ,Multi-Head Attention

  • opts.py: Parameter and config info.

  • utils.py: tools.

  • dataHelper: data helper

Contributor

Welcome your issues and contribution!!!


Get A Weekly Email With Trending Projects For These Topics
No Spam. Unsubscribe easily at any time.
Python (1,122,589
Pytorch (10,688
Cnn (3,106
Lstm (1,910
Benchmark (1,818
Text Classification (1,099
Cnn Classification (381
Quantum (372
Related Projects