See the Fall 2020 tidymodels update!

Machine Learning in R

This is the repository for D-Lab’s Introduction to Machine Learning in R workshop. View the associated slides here.

RStudio Binder: Binder

Content outline

  • Background on machine learning
    • Classification vs regression
    • Performance metrics
  • Data preprocessing
    • Missing data
    • Train/test splits
  • Algorithm walkthroughs
    • Lasso
    • Decision trees
    • Random forests
    • Gradient boosted machines
    • SuperLearner ensembling
    • Principal component analysis
    • Hierarchical agglomerative clustering
  • Challenge questions

Getting started

Please follow the notes in


The seven algorithm R Markdown files (lasso, decision tree, random forest, xgboost, SuperLearner, PCA, and clustering) are designed to function in a standalone manner.

After installing and librarying the packages in 01-overview.Rmd, run all the code in 02-preprocessing.Rmd to preprocess the data. Then, open any one of the seven algorithm R Markdown files and "Run All" code to see the results and visualizations!

Assumed participant background

We assume that participants have familiarity with:

  • Basic R syntax
  • Statistical concepts such as mean and standard deviation

Technology requirements

Please bring a laptop with the following:


Browse resources listed on the D-Lab Machine Learning Working Group repository. Scroll down to see code examples in R and Python, books, courses at UC Berkeley, online classes, and other resources and groups to help you along your machine learning journey!


The slides were made using xaringan, which is a wrapper for remark.js. Check out Chapter 7 if you are interested in making your own! The theme borrows from Brad Boehmke's presentation on Decision Trees, Bagging, and Random Forests - with an example implementation in R.



