"mlmachine is a Python library that organizes and accelerates notebook-based machine learning experiments."

Table of Contents

Novel Functionality

Easy, Elegant EDA

mlmachine creates beautiful and informative EDA panels with ease:

# create EDA panel for all "category" features
for feature in["category"]:

alt text

Pandas-in / Pandas-out Pipelines

mlmachine makes Scikit-learn transformers Pandas-friendly.

Here's an example. See how simply wrapping the mlmachine utility PandasTransformer() around OneHotEncoder() maintains our DataFrame:

alt text

KFold Target Encoding

mlmachine includes a utility called KFoldEncoder, which applies target encoding on categorical features and leverages out-of-fold encoding to prevent target leakage:

# perform 5-fold target encoding with TargetEncoder from the category_encoders library
encoder = KFoldEncoder(
    cv=KFold(n_splits=5, shuffle=True, random_state=0),

alt text

Crowd-sourced Feature Importance & Exhaustive Feature Selection

mlmachine employs a robust approach to estimating feature importance by using a variety of techniques:

  • Tree-based Feature Importance
  • Recursive Feature Elimination
  • Sequential Forward Selection
  • Sequential Backward Selection
  • F-value / p-value
  • Variance 
  • Target Correlation

This occurs with one simple execution, and operates on multiple estimators and/or models, and one or more scoring metrics:

# instantiate custom models
rf2 = RandomForestClassifier(max_depth=2)
rf4 = RandomForestClassifier(max_depth=4)
rf6 = RandomForestClassifier(max_depth=6)

# estimator list - default XGBClassifier, default
# RandomForestClassifier and three custom models
estimators = [

# instantiate FeatureSelector object
fs = mlmachine_titanic_machine.FeatureSelector(

# run feature importance techniques, use ROC AUC and
# accuracy score metrics and 0 CV folds (where applicable)
feature_selector_summary = fs.feature_selector_suite(

Then the features are winnowed away, from least important to most important, through an exhaustive cross-validation procedure in search of an optimum feature subset:

alt text

Hyperparameter Tuning with Bayesian Optimization

mlmachine can perform Bayesian optimization on multiple estimators in one shot, and includes functionality for visualizing model performance and parameter selections:

# generate parameter selection panels for each parameter

alt text

Example Notebooks

All examples can be viewed here

Example Notebook 1 - Learn the basics of mlmachine, how to create EDA panels, and how to execute Pandas-friendly Scikit-learn transformations and pipelines.

Example Notebook 2 - Learn how use mlmachine to assess a datasets pre-processing needs. See examples of how to use novel functionality, such as GroupbyImputer(), KFoldEncoder() and DualTransformer().

Example Notebook 3 - Learn how to perform thorough feature importance estimation, followed by an exhaustive, cross-validation-driven feature selection process.

Example Notebook 4 - Learn how to execute hyperparameter tuning with Bayesian optimization for multiple model and multiple parameter spaces in one simple execution.

Articles on Medium

mlmachine - Clean ML Experiments, Elegant EDA & Pandas Pipelines - Published 4/3/2020

mlmachine - GroupbyImputer, KFoldEncoder, and Skew Correction - Published 4/13/2020


Python Requirements: 3.6, 3.7

mlmachine uses the latest, or almost latest, versions of all dependencies. Therefore, it is highly recommended that mlmachine is installed in a virtual environment.


Create a new virtual environment:

$ pyenv virtualenv 3.7.5 mlmachine-env

Activate your new virtual environment:

$ pyenv activate mlmachine-env

Install mlmachine using pip to install mlmachine and all dependencies:

$ pip install mlmachine


Create a new virtual environment:

$ conda create --name mlmachine-env python=3.7

Activate your new virtual environment:

$ conda activate mlmachine-env

Install mlmachine using pip to install mlmachine and all dependencies:

$ pip install mlachine


Any and all feedback is welcome. Please send me an email at [email protected]


mlmachine stands on the shoulders of many great Python packages:

catboost | category_encoders | eif | hyperopt | imbalanced-learn | jupyter | lightgbm | matplotlib | numpy | pandas | prettierplot | scikit-learn | scipy | seaborn | shap | statsmodels | xgboost |

