Awesome Open Source
Awesome Open Source

R, Python and Mathematica Codes in Data Science

Welcome to my GitHub repo.

I am a Data Scientist and I code in R, Python and Wolfram Mathematica. Here you will find some Machine Learning, Deep Learning, Natural Language Processing and Artificial Intelligence models I developed.

Outputs of the models can be seen at my portfolio: https://drive.google.com/file/d/0B0RLknmL54khdjRQWVBKeTVxSHM/view?usp=sharing


Mathematica Codes

MNIST_HOT.5.FULL: is a solution for the MNIST dataset in Mathematica, with 96.51% accuracy, based on difference of pixels.

Mathematica - Artificial Intelligence Simulating Interactions in Social Networks: is a model that simulates human interactions in a social network using cellular automata and agent-based modeling. Each agent has 3 possible choices for interation and a memory. The code has 14 pages with a big loop included in one line of code.

Mathematica - Facial Recognition in Movement: This code operationalizes facial recognition in a downloaded YouTube video. The output is also a video with the result of face recognition (YouTube link of the output is included in code page)

Mathematica - Monte Carlo Simulation: is an animated model of a Markov Chain Monte Carlo Simulation for autonomous driving. A video of the dynamic output was also generated and link for the YouTube video is included in code page.

Mathematica - Social Network Surveillance: is a model that tracks individuals in a social network, tracks also his connections and future interactions.


Python Codes

Keras version used in models: keras==1.1.0 | LSTM 0.2

Python - Autoencoder MNIST: is an autoencoder model for classification of images developed with Keras, for the MNIST dataset, with model Checkpoint as a callback to save weights.

Python - Autoencoder for Text Classification: is an autoencoder model for classification of text made with Keras, also with model Checkpoint.

Python - Deep Learning with Lasagne: is a deep neural network developed with Lasagne, where you can see values of weights in each layer, including bias.

Python - Face Recognition: is a model using OpenCV to detect faces.

Python - Image Extraction from Twitter: is a model that extracts pictures and their links from Twitter webpages, plotting with matplotlib.

Python - Keras Convolutional Neural Network: is a CNN developed to classify the MNIST dataset with an accuracy greater than 99%.

Python - Keras Deep Regressor: is a deep Neural Network for prediction of a continuous output made with Keras, learning rate scheduler according to derivative of error, random initial weights, with loss history.

Python - Keras LSTM Network: is a Recurrent Neural Network (LSTM) to predict and generate text.

Python - Keras Multi Layer Perceptron: is a MLP model, Neural Networks made with Keras with loss history, scheduled learning rate according to derivative of error for prediction and classification.

Python - Machine Learning: is a Principal Components Analysis followed by a Linear Regression.

Python - NLP Doc2Vec: is a Natural Language Processing model where I asked a Wikipedia webpage a question and 4 possible answers were semantically chosen from the tokenized and vectorized webpage, using KNN and cosine distance.

Python - NLP Semantic Analysis: is a Natural Language Processing model that classifies a given sentence according to semantic similarity to other sentences, using cosine distance.

Python - NLP Word2Vec: is a model developed from scratch to measure cosine similarity among words.

Python - Reinforcement Learning: is a model based on simple rules and Game Theory where agents attitude change according to payoff achieved. Can be adapted for tit-for-tat strategy, always cooperate, always defeat and other strategies. Rewards were placed in the payoff matrix.

Python - Social Networks: is a model that draws social networks configuration and connections.

Python - Support Vector Machines: is a Machine Learning model that classifies the Iris dataset with SVM and plots it.

Python - Theano Deep Learning: is a Neural Network with two hidden layers using Theano.


R Codes

R - Churn of Customers: is a model that uses a logistic regression associated with a threshold to predict which customers present the greater risk to be lost.

R - Data Cleaning + Multinomial Regression: is a model that presents data cleaning and a multinomial regression using package nnet to classify customers according to their level of loyalty.

R - Face Recognition: is a code to detect faces and objects in R.

R - Geolocation Brazil: is a file for geo-spatial localization, brazilian map.

R - Geolocation USA: is also a file for geo-spatial localization, USA map.

R - Geolocation World: is a file for geo-spatial localization, world map, zoom available, customizable icons.

R - Gradient Descent Logistic: is a model that performs a gradient descent to define a threshold for the sigmoid function in a Logistic Regression. Boosting was implemented and ROC curves compared.

R - H2O Deep Learning: is a Neural Network model developed to predict recommendations and word-of-mouth advertising.

R - Imbalanced classes is a model for employee churn, where features have no correlation with target variable and also there are imbalanced classes in the proportion 1/20. A logistic regression from scratch is applied, a hill climbing gradient is used to define the best threshold for the logistic function and after that, boosting was compared regarding AUC in a ROC plot.

Logistic Regression + Gradient Descent + Boosting is a model where features have no correlation with target variable. Logistic Regression with Gradient Descent was applied, and then Boosting.

R - MNIST: is a solution for the MNIST dataset, developed from scratch.

R - Markov Chains: is a simple visualization of Markov Chains and probabilities associated.

R - NeuralNet: is a Neural Network model developed to predict and classify word-of-mouth advertising.

R - Ridge Regression: is a model with Ridge Regularization made from scratch to prevent overfitting.

R - Deep Learning: is a Neural Network model with 2 hidden layers for prediction of a continuous variable.


Get A Weekly Email With Trending Projects For These Topics
No Spam. Unsubscribe easily at any time.
python (53,650
deep-learning (3,922
python3 (1,613
nlp (1,078
keras (762
natural-language-processing (683
rstats (308
lstm (266
python-3 (209
face-recognition (172
word2vec (114
timeseries (103
nlp-machine-learning (86
mathematica (82
autoencoder (79
theano (75
lstm-neural-networks (45
time-series-analysis (42
lasagne (19