This is a survey on deep learning models for text classification and will be updated frequently with testing and evaluation on different datasets.
Natural Language Processing tasks ( part-of-speech tagging, chunking, named entity recognition, text classification, etc .) has gone through tremendous amount of research over decades. Text Classification has been the most competed NLP task in kaggle and other similar competitions. Count based models are being phased out with new deep learning models emerging almost every month. This project is an attempt to survey most of the neural based models for text classification task. Models selected, based on CNN and RNN, are explained with code (keras and tensorflow) and block diagrams. The models are evaluated on one of the kaggle competition medical dataset.
Update: Non stop training and power issues in my geographic location burned my motherboard. By the time i had to do 2 RMAs with ASROCK and get the system up and running, the competition was over :( but still i learned a lot.
~/Programs/anaconda3
cd ~/Programs/anaconda3 && mkdir envs
and cd envs && ../bin/conda create -p ~/Programs/anaconda3/envs/dsotc-c3 python=3.6 anaconda
.source /home/bicepjai/Programs/anaconda3/envs/dsotc-c3/bin/activate dsotc-c3
~/Programs/anaconda3/envs/dsotc-c3/bin/pip
using conda install pip
(anaconda has issues with using pip so use the fill path)pip install -r requirements.txt
for installing all dependenciesjupyter nbextensions_configurator enable --user
jupyter contrib nbextension install --user
Collapsible Headings
, ExecuteTime
, Table of Contents
Now we should be ready to run this project and perform reproducible research. The details regarding the machine used for training can be found here
Version Reference on some important packages used
Details regarding the data used can be found here
This project is completed and the documentation can be found here. The papers explored in this project