Awesome Open Source
Awesome Open Source

The Data Science Codex

A collection of code and resources to serve as a starting point for data science projects. For more explanation and material on R visit my blog.


Data Visualization

Statistical Modeling and Machine Learning

  • Modeling Fundamentals (R) - A primer on logistic and linear regression modeling with the classic Titanic dataset.
  • Survival Analysis (R) - Survival analysis methods such as cox proportion hazard models and Kaplan-Meier curves.
  • Modeling Workflows (R) - Streamlined Tidyverse modeling workflows with the gapminder dataset.
  • Multilevel Models (R) - Multi-level aka. mixed effects models
  • Time Series Modeling (R) - Experimenting with time series modeling (tsibble, forecast libraries, prophet, etc.)
  • Ordinal Regression (R) - Experimenting with ordinal (ranked categorical outcome) regression
  • Presenting Regression Models (R) - Code for cleaning the outputs of regression models for presentations.
  • Sklearn Modeling Workflows (Python) - Modeling workflows with sklearn (cross-validation, randomized search for optimizing hyperparameters, lift curves).
  • Sklearn - Skopt Workflow (Python) - Modeling workflow with sklearn and scikit-optimize (bayesian hyperparameter optimization.
  • Machine Learning with Caret (R) - Using the Caret library for machine learning.
  • Parsnip (R) - fitting models with the parsnip package (from tidymodels)

Bayesian Models


  • k-means clustering (R) - Using the k-means algorithm to cluster data.
  • Clustering (Python) - Agglomerative (Hierarchical) clustering, k-means clustering, and Gaussian mixture models

Stats Analysis


  • Document Embeddings (Python) - Using word embeddings to compare the similarity of State of the Union addresses.
  • State of the Union Analysis (Python) - An exploration of state of the union addresses with topic modeling and sentiment analysis.
  • Sentiment Analysis (R) - Exploring sentiment analysis in R.
  • LSTM Demo (Python) - An LSTM network for predicting if a company review from glassdoor is positive


Alternative Project Comparisons
Related Awesome Lists
Top Programming Languages
Top Projects

Get A Weekly Email With Trending Projects For These Topics
No Spam. Unsubscribe easily at any time.
Python (795,156
Jupyter Notebook (150,952
R (57,168
Machine Learning (36,477
Natural Language Processing (14,548
Statistics (10,409
Data Science (9,949
Data Visualization (5,577
Geospatial Analysis (82
Statistical Modeling (15