Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Easyocr | 19,436 | 67 | 17 days ago | 31 | September 20, 2022 | 298 | apache-2.0 | Python | ||
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc. | ||||||||||
Screenshot To Code | 14,121 | a year ago | 17 | other | HTML | |||||
A neural network that transforms a design mock-up into a static website. | ||||||||||
Deeplearning | 7,463 | a year ago | 8 | apache-2.0 | Jupyter Notebook | |||||
深度学习入门教程, 优秀文章, Deep Learning Tutorial | ||||||||||
Machine Learning Collection | 5,693 | a month ago | 92 | mit | Python | |||||
A resource for learning about Machine learning & Deep Learning | ||||||||||
Stockpredictionai | 3,235 | 2 years ago | 320 | |||||||
In this noteboook I will create a complete process for predicting stock price movements. Follow along and we will achieve some pretty good results. For that purpose we will use a Generative Adversarial Network (GAN) with LSTM, a type of Recurrent Neural Network, as generator, and a Convolutional Neural Network, CNN, as a discriminator. We use LSTM for the obvious reason that we are trying to predict time series data. Why we use GAN and specifically CNN as a discriminator? That is a good question: there are special sections on that later. | ||||||||||
Keras Resources | 3,174 | 10 months ago | 13 | |||||||
Directory of tutorials and open-source code repositories for working with Keras, the Python deep learning library | ||||||||||
Pytorch Sentiment Analysis | 2,905 | 2 years ago | 16 | mit | Jupyter Notebook | |||||
Tutorials on getting started with PyTorch and TorchText for sentiment analysis. | ||||||||||
Automatic_speech_recognition | 2,743 | 2 years ago | 69 | mit | Python | |||||
End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow | ||||||||||
Tensorflow_template_application | 1,868 | 3 months ago | 14 | apache-2.0 | Python | |||||
TensorFlow template application for deep learning | ||||||||||
Ncrfpp | 1,862 | a year ago | 1 | March 16, 2022 | 5 | apache-2.0 | Python | |||
NCRF++, a Neural Sequence Labeling Toolkit. Easy use to any sequence labeling tasks (e.g. NER, POS, Segmentation). It includes character LSTM/CNN, word LSTM/CNN and softmax/CRF components. |
Update(21 Sept. 2018): I don't actively maintain this repository. This work was done for a course project and the dataset cannot be released because I don't own the copyright. However, everything in this repository can be easily modified to work with other datasets. I recommend reading the sloppily written project report for this project which can be found in docs/
.
We use and compare various different methods for sentiment analysis on tweets (a binary classification problem). The training dataset is expected to be a csv file of type tweet_id,sentiment,tweet
where the tweet_id
is a unique integer identifying the tweet, sentiment
is either 1
(positive) or 0
(negative), and tweet
is the tweet enclosed in ""
. Similarly, the test dataset is a csv file of type tweet_id,tweet
. Please note that csv headers are not expected and should be removed from the training and test datasets.
There are some general library requirements for the project and some which are specific to individual methods. The general requirements are as follows.
numpy
scikit-learn
scipy
nltk
The library requirements specific to some methods are:
keras
with TensorFlow
backend for Logistic Regression, MLP, RNN (LSTM), and CNN.xgboost
for XGBoost.Note: It is recommended to use Anaconda distribution of Python.
preprocess.py <raw-csv-path>
on both train and test data. This will generate a preprocessed version of the dataset.stats.py <preprocessed-csv-path>
where <preprocessed-csv-path>
is the path of csv generated from preprocess.py
. This gives general statistical information about the dataset and will two pickle files which are the frequency distribution of unigrams and bigrams in the training dataset.After the above steps, you should have four files in total: <preprocessed-train-csv>
, <preprocessed-test-csv>
, <freqdist>
, and <freqdist-bi>
which are preprocessed train dataset, preprocessed test dataset, frequency distribution of unigrams and frequency distribution of bigrams respectively.
For all the methods that follow, change the values of TRAIN_PROCESSED_FILE
, TEST_PROCESSED_FILE
, FREQ_DIST_FILE
, and BI_FREQ_DIST_FILE
to your own paths in the respective files. Wherever applicable, values of USE_BIGRAMS
and FEAT_TYPE
can be changed to obtain results using different types of features as described in report.
baseline.py
. With TRAIN = True
it will show the accuracy results on training dataset.naivebayes.py
. With TRAIN = True
it will show the accuracy results on 10% validation dataset.logistic.py
to run logistic regression model OR run maxent-nltk.py <>
to run MaxEnt model of NLTK. With TRAIN = True
it will show the accuracy results on 10% validation dataset.decisiontree.py
. With TRAIN = True
it will show the accuracy results on 10% validation dataset.randomforest.py
. With TRAIN = True
it will show the accuracy results on 10% validation dataset.xgboost.py
. With TRAIN = True
it will show the accuracy results on 10% validation dataset.svm.py
. With TRAIN = True
it will show the accuracy results on 10% validation dataset.neuralnet.py
. Will validate using 10% data and save the best model to best_mlp_model.h5
.lstm.py
. Will validate using 10% data and save models for each epock in ./models/
. (Please make sure this directory exists before running lstm.py
).cnn.py
. This will run the 4-Conv-NN (4 conv layers neural network) model as described in the report. To run other versions of CNN, just comment or remove the lines where Conv layers are added. Will validate using 10% data and save models for each epoch in ./models/
. (Please make sure this directory exists before running cnn.py
).extract-cnn-feats.py <saved-model>
. This will generate 3 files, train-feats.npy
, train-labels.txt
and test-feats.npy
.cnn-feats-svm.py
which uses files from the previous step to perform SVM classification on features extracted from CNN model../results/
and run majority-voting.py
. This will generate majority-voting.csv
.dataset/positive-words.txt
: List of positive words.dataset/negative-words.txt
: List of negative words.dataset/glove-seeds.txt
: GloVe words vectors from StanfordNLP which match our dataset for seeding word embeddings.Plots.ipynb
: IPython notebook used to generate plots present in report.