LightAutoML project from Sberbank AI Lab AutoML group is the framework for automatic classification and regression model creation.
Current available tasks to solve:
Currently we work with datasets, where each row is an object with its specific features and target. Multitable datasets and sequences are now under contruction :)
Note: for automatic creation of interpretable models we use
AutoWoE library made by our group as well.
LightAutoML video guides:
Articles about LightAutoML:
See the Documentation of LightAutoML.
To install LAMA framework on your machine:
pip install -U lightautoml
If you want to create a specific virtual environment for LAMA, you need to install
python3-venv system package and run the following command, which creates
lama_venv virtual env with LAMA inside:
To check this variant of installation and run all the demo scripts, use the command below:
To install optional support for generating reports in pdf format run following commands:
# MacOS brew install cairo pango gdk-pixbuf libffi # Debian / Ubuntu sudo apt-get install build-essential libcairo2 libpango-1.0-0 libpangocairo-1.0-0 libgdk-pixbuf2.0-0 libffi-dev shared-mime-info # Fedora sudo yum install redhat-rpm-config libffi-devel cairo pango gdk-pixbuf2 # Windows # follow this tutorial https://weasyprint.readthedocs.io/en/stable/install.html#windows poetry install -E pdf
Builded official documentation for LightAutoML is available
To find out how to work with LightAutoML, we have several tutorials:
Tutorial_1. Create your own pipeline.ipynb- shows how to create your own pipeline from specified blocks: pipelines for feature generation and feature selection, ML algorithms, hyperparameter optimization etc.
Tutorial_2. AutoML pipeline preset.ipynb- shows how to use LightAutoML presets (both standalone and time utilized variants) for solving ML tasks on tabular data. Using presets you can solve binary classification, multiclass classification and regression tasks, changing the first argument in Task.
Tutorial_3. Multiclass task.ipynb- shows how to build ML pipeline for multiclass ML task by hand
Tutorial_4. SQL data source for pipeline preset.ipynb- shows how to use LightAutoML presets (both standalone and time utilized variants) for solving ML tasks on tabular data from SQL data base instead of CSV.
Each tutorial has the step to enable Profiler and completes with Profiler run, which generates distribution for each function call time and shows it in interactive HTML report: the report show full time of run on its top and interactive tree of calls with percent of total time spent by the specific subtree.
Important 1: for production you have no need to use profiler (which increase work time and memory consomption), so please do not turn it on - it is in off state by default
Important 2: to take a look at this report after the run, please comment last line of demo with report deletion command.
Kaggle kernel examples of LightAutoML usage:
For more examples, in
tests folder you can find different scenarios of LightAutoML usage:
demo0.py- building ML pipeline from blocks and fit + predict the pipeline itself.
demo1.py- several ML pipelines creation (using importances based cutoff feature selector) to build 2 level stacking using AutoML class
demo2.py- several ML pipelines creation (using iteartive feature selection algorithm) to build 2 level stacking using AutoML class
demo3.py- several ML pipelines creation (using combination of cutoff and iterative FS algos) to build 2 level stacking using AutoML class
demo4.py- creation of classification and regression tasks for AutoML with loss and evaluation metric setup
demo5.py- 2 level stacking using AutoML class with different algos on first level including LGBM, Linear and LinearL1
demo6.py- AutoML with nested CV usage
demo7.py- AutoML preset usage for tabular datasets (predefined structure of AutoML pipeline and simple interface for users without building from blocks)
demo8.py- creation pipelines from blocks to build AutoML, solving multiclass classification task
demo9.py- AutoML time utilization preset usage for tabular datasets (predefined structure of AutoML pipeline and simple interface for users without building from blocks)
demo10.py- creation pipelines from blocks (including CatBoost) to build AutoML, solving multiclass classification task
demo11.py- AutoML NLP preset usage for tabular datasets with text columns
demo12.py- AutoML tabular preset usage with custom validation scheme and multiprocessed inference
If you are interested in contributing to LightAutoML, please read the Contributing Guide to get started.
Write a message to us: