Awesome Open Source
Awesome Open Source

See the Fall 2020 tidymodels update!

Machine Learning in R

This is the repository for D-Lab’s Introduction to Machine Learning in R workshop. View the associated slides here.

RStudio Binder: Binder

Content outline

  • Background on machine learning
    • Classification vs regression
    • Performance metrics
  • Data preprocessing
    • Missing data
    • Train/test splits
  • Algorithm walkthroughs
    • Lasso
    • Decision trees
    • Random forests
    • Gradient boosted machines
    • SuperLearner ensembling
    • Principal component analysis
    • Hierarchical agglomerative clustering
  • Challenge questions

Getting started

Please follow the notes in


The seven algorithm R Markdown files (lasso, decision tree, random forest, xgboost, SuperLearner, PCA, and clustering) are designed to function in a standalone manner.

After installing and librarying the packages in 01-overview.Rmd, run all the code in 02-preprocessing.Rmd to preprocess the data. Then, open any one of the seven algorithm R Markdown files and "Run All" code to see the results and visualizations!

Assumed participant background

We assume that participants have familiarity with:

  • Basic R syntax
  • Statistical concepts such as mean and standard deviation

Technology requirements

Please bring a laptop with the following:


Browse resources listed on the D-Lab Machine Learning Working Group repository. Scroll down to see code examples in R and Python, books, courses at UC Berkeley, online classes, and other resources and groups to help you along your machine learning journey!


The slides were made using xaringan, which is a wrapper for remark.js. Check out Chapter 7 if you are interested in making your own! The theme borrows from Brad Boehmke's presentation on Decision Trees, Bagging, and Random Forests - with an example implementation in R.

Related Awesome Lists
Top Programming Languages
Top Projects

Get A Weekly Email With Trending Projects For These Topics
No Spam. Unsubscribe easily at any time.
Css (148,845
Learning (75,828
Machine Learning (38,590
Tutorials (23,654
Tree (20,383
Random (12,694
Forest (3,076
Pca (2,063
Xgboost (1,702
Decision Trees (1,433
Random Forest (1,288
Lasso (866
Superlearner (10