Awesome Open Source
Awesome Open Source
Sponsorship

awesome-automl

Papers
blogs & articles & book
Libraries
Projects
benchmark

collecting related resources of automated machine learning here. some links were from below,keyword:"automl,autodl,automated machine learning;hyperparameter optimization;neural architecture search"


keyword:"meta learning"

you can take part in automl Challenge,autodl Challenge
   or find competitions in kaggle
   or get search result from reddit, bing, quora(search keyword should be "automatic machine learning","automl","meta learning","automated machine learning" and so on),
   or access the website automl,
   or search your keyword in arxiv papers info,
   or others to find some perfect resources there.


This papers or books or slides are ordered by years, before each entity is the theme the entity belonged, if you want to choice one theme, e.g. "Architecture Search", you can ctrl+F then highlight the papers.
Themes are as follow:

  • 1.【Architecture Search】:
    【Random Search】; 【Evolutionary Algorithms】;【Transfer Learning】;【Reinforcement Learning】;【Local Search】;
  • 2.【Hyperparameter Optimization】:
    【Bayesian Optimization】;【Meta Learning】;【Particle Swarm Optimization】;【Lipschitz Functions】;【Random Search】;【Transfer Learning】;【Local Search】;
  • 3.【Multi-Objective NAS】;
  • 4.【Automated Feature Engineering】;【Reinforcement Learning】;【Meta Learning】;    
  • 5.【Frameworks】;
  • 6.【Meta Learning】;
  • 7.【Miscellaneous】

ps:The theme is a bit confusing and I will modify it later.


Papers

1990

2002

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018


SURVEY


blogs & articles & book

2008

2016

2017

2018


Libraries

[S:Structured Data; I:Image; A:Audio; N:NLP]

  • mlpapers/automl:
  • shukwong/awesome_automl_libraries:
  • Rakib091998/Auto_ML
  • DeepHiveMind/AutoML_AutoKeras_HPO
  • theainerd/automated-machine-learning:libraries
  • [SIAN]awslabs/autogluon: AutoGluon automates machine learning tasks enabling you to easily achieve strong predictive performance in your applications. With just a few lines of code, you can train and deploy high-accuracy deep learning models on tabular, image, and text data
  • awslabs/adatune: AdaTune is a library to perform gradient based hyperparameter tuning for training deep neural networks. AdaTune currently supports tuning of the learning_rate parameter but some of the methods implemented here can be extended to other hyperparameters like momentum or weight_decay etc. AdaTune provides the following gradient based hyperparameter tuning algorithms - HD, RTHO and our newly proposed algorithm, MARTHE. The repository also contains other commonly used non-adaptive learning_rate adaptation strategies like staircase-decay, exponential-decay and cosine-annealing-with-restarts. The library is implemented in PyTorch
  • pycaret/pycaret:PyCaret is an open source low-code machine learning library in Python that aims to reduce the hypothesis to insights cycle time in a ML experiment. It enables data scientists to perform end-to-end experiments quickly and efficiently. In comparison with the other open source machine learning libraries, PyCaret is an alternative low-code library that can be used to perform complex machine learning tasks with only few lines of code. PyCaret is essentially a Python wrapper around several machine learning libraries and frameworks such as scikit-learn, XGBoost, Microsoft LightGBM, spaCy and many more.
  • [IN]microsoft/nni: An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning
  • [SIAN]uber/ludwig: Ludwig is a toolbox built on top of TensorFlow that allows users to train and test deep learning models without the need to write code
  • FeatureLabs/Featuretools: a good library for automatically engineering features from relational and transactional data
  • automl/auto-sklearn: it's really a drop-in replacement for scikit-learn estimators.
  • automl/HPOlib2: HPOlib2 is a library for hyperparameter optimization and black box optimization benchmarks.
  • automl/Auto-Pytorch: Automatic architecture search and hyperparameter optimization for PyTorch
  • automl/RoBO: RoBO uses the Gaussian processes library george and the random forests library pyrfr.
  • automl/Auto-WEKA: Repository for Auto-WEKA, wich provides automatic selection of models and hyperparameters for WEKA
  • automl/SMAC3: SMAC is a tool for algorithm configuration to optimize the parameters of arbitrary algorithms across a set of instances. This also includes hyperparameter optimization of ML algorithms. The main core consists of Bayesian Optimization in combination with a aggressive racing mechanism to efficiently decide which of two configuration performs better
  • NVIDIA/Milano: Milano (Machine learning autotuner and network optimizer) is a tool for enabling machine learning researchers and practitioners to perform massive hyperparameters and architecture searches
  • facebook/AX;github: Ax is an accessible, general-purpose platform for understanding, managing, deploying, and automating adaptive experiments. Adaptive experimentation is the machine-learning guided process of iteratively exploring a (possibly infinite) parameter space in order to identify optimal configurations in a resource-efficient manner. Ax currently supports Bayesian optimization and bandit optimization as exploration strategies. Bayesian optimization in Ax is powered by BoTorch, a modern library for Bayesian optimization research built on PyTorch
  • pytorch/BOTORCH;github: BoTorch is a library for Bayesian Optimization built on PyTorch
  • google-research/automl_zero: AutoML-Zero aims to automatically discover computer programs that can solve machine learning tasks, starting from empty or random programs and using only basic math operations. The goal is to simultaneously search for all aspects of an ML algorithm—including the model structure and the learning strategy—while employing minimal human bias.
  • kubeflow/katib: Katib is a Kubernetes-based system for Hyperparameter Tuning and Neural Architecture Search. Katib supports a number of ML frameworks, including TensorFlow, Apache MXNet, PyTorch, XGBoost, and others
  • [I]keras-team/AutoKeras: An AutoML system based on Keras. It is developed by DATA Lab at Texas A&M University. The goal of AutoKeras is to make machine learning accessible for everyone
  • keras-team/keras-tuner: Hyperparameter tuning for humans
  • HDI-Project/AutobBzaar: AutoBazaar is an AutoML system created using The Machine Learning Bazaar, a research project and framework for building ML and AutoML systems by the Data To AI Lab at MIT.
  • HDI-Project/BTB: BTB ("Bayesian Tuning and Bandits") is a simple, extensible backend for developing auto-tuning systems such as AutoML systems. It provides an easy-to-use interface for tuning and selection
  • tensorflow/adanet: AdaNet is a lightweight TensorFlow-based framework for automatically learning high-quality models with minimal expert intervention. AdaNet builds on recent AutoML efforts to be fast and flexible while providing learning guarantees. Importantly, AdaNet provides a general framework for not only learning a neural network architecture, but also for learning to ensemble to obtain even better models
  • IBM/lale: Lale is a Python library for semi-automated data science. Lale makes it easy to automatically select algorithms and tune hyperparameters of pipelines that are compatible with scikit-learn, in a type-safe fashion. If you are a data scientist who wants to experiment with automated machine learning, this library is for you! Lale adds value beyond scikit-learn along three dimensions: automation, correctness checks, and interoperability. For automation, Lale provides a consistent high-level interface to existing pipeline search tools including Hyperopt, GridSearchCV, and SMAC. For correctness checks, Lale uses JSON Schema to catch mistakes when there is a mismatch between hyperparameters and their type, or between data and operators. And for interoperability, Lale has a growing library of transformers and estimators from popular libraries such as scikit-learn, XGBoost, PyTorch etc. Lale can be installed just like any other Python package and can be edited with off-the-shelf Python tools such as Jupyter notebooks
  • CiscoAI/amla: AMLA is a common framework to run different AutoML algorithms for neural networks without changing the underlying systems needed to configure, train and evaluate the generated networks.
  • ARM-software/mango: Mango is a python library for parallel optimization over complex search spaces. Currently, Mango is intended to find the optimal hyperparameters for machine learning algorithms. Check out the quick 12 seconds demo of Mango approximating a complex decision boundary of SVM
  • mindsdb/mindsdb: MindsDB is an Explainable AutoML framework for developers built on top of Pytorch. It enables you to build, train and test state of the art ML models in as simple as one line of code
  • EpistasisLab/TPOT: is using genetic programming to find the best performing ML pipelines, and it is built on top of scikit-learn
  • Neuraxio/Neuraxle: A Sklearn-like Framework for Hyperparameter Tuning and AutoML in Deep Learning projects. Finally have the right abstractions and design patterns to properly do AutoML. Let your pipeline steps have hyperparameter spaces. Enable checkpoints to cut duplicate calculations. Go from research to production environment easily.
  • deephyper/deephyper: DeepHyper is an automated machine learning (AutoML) package for deep neural networks. It comprises two components: 1) Neural architecture search is an approach for automatically searching for high-performing the deep neural network search_space. 2) Hyperparameter search is an approach for automatically searching for high-performing hyperparameters for a given deep neural network. DeepHyper provides an infrastructure that targets experimental research in neural architecture and hyperparameter search methods, scalability, and portability across HPC systems. It comprises three modules: benchmarks, a collection of extensible and diverse benchmark problems; search, a set of search algorithms for neural architecture search and hyperparameter search; and evaluators, a common interface for evaluating hyperparameter configurations on HPC platforms
  • dataloop-ai/zazuml: This is an easy open-source AutoML framework for object detection. Currently this project contains a model & hyper-parameter tuner, auto augmentations, trial manager and prediction trigger, already loaded with your top preforming model-checkpoint. A working pipeline ready to be plugged into your product, simple as that
  • Ashton-Sidhu/aethos: Aethos is a library/platform that automates your data science and analytical tasks at any stage in the pipeline. Aethos is, at its core, a uniform API that helps automate analytical techniques from various libaries such as pandas, sci-kit learn, gensim, etc
  • hyperopt/Hyperopt-sklearn: Hyperopt-sklearn is Hyperopt-based model selection among machine learning algorithms in scikit-learn.
  • SigOpt: SigOpt is a standardized, scalable, enterprise-grade optimization platform and API designed to unlock the potential of your modeling pipelines. This fully agnostic software solution accelerates, amplifies, and scales the model development process.
  • [S]H2O-offical website; H2O-github: Open Source Fast Scalable Machine Learning Platform For Smarter Applications: Deep Learning, Gradient Boosting & XGBoost, Random Forest, Generalized Linear Modeling (Logistic Regression, Elastic Net), K-Means, PCA, Stacked Ensembles, Automatic Machine Learning (AutoML), etc
  • [S]MLJAR;github: An Automated Machine Learning (AutoML) python package for tabular data. It can handle: Binary Classification, MultiClass Classification and Regression. It provides explanations and markdown reports.
  • autogoal/autogoal:AutoGOAL is a Python library for automatically finding the best way to solve a given task. It has been designed mainly for Automated Machine Learning (aka AutoML) but it can be used in any scenario where you have several possible ways (i.e., programs) to solve a given task
  • optuna/optuna: Optuna is an automatic hyperparameter optimization software framework, particularly designed for machine learning. It features an imperative, define-by-run style user API. Thanks to our define-by-run API, the code written with Optuna enjoys high modularity, and the user of Optuna can dynamically construct the search spaces for the hyperparameters
  • DataCanvasIO/Hypernets:Hypernets is a general AutoML framework, based on which it can implement automatic optimization tools for various machine learning frameworks and libraries, including deep learning frameworks such as tensorflow, keras, pytorch, and machine learning libraries like sklearn, lightgbm, xgboost, etc. We introduced an abstract search space representation, taking into account the requirements of hyperparameter optimization and neural architecture search(NAS), making Hypernets a general framework that can adapt to various automated machine learning needs.
  • LGE-ARC-AdvancedAI/Auptimizer: Auptimizer is an optimization tool for Machine Learning (ML) that automates many of the tedious parts of the model building process. Currently, Auptimizer helps with:1)Automating tedious experimentation - Start using Auptimizer by changing just a few lines of your code. It will run and record sophisticated hyperparameter optimization (HPO) experiments for you, resulting in effortless consistency and reproducibility.2)
  • fmfn/BayesianOptimization: This is a constrained global optimization package built upon bayesian inference and gaussian process, that attempts to find the maximum value of an unknown function in as few iterations as possible. This technique is particularly suited for optimization of high cost functions, situations where the balance between exploration and exploitation is important
  • rmcantin/BayesOpt: BayesOpt is an efficient implementation of the Bayesian optimization methodology for nonlinear optimization, experimental design and hyperparameter tunning
  • Angle-ml/automl: Angel-AutoML provides automatic hyper-parameter tuning and feature engineering operators. It is developed with Scala. As a stand-alone library, Angel-AutoML can be easily integrated in Java and Scala projects.
  • auto-flow/auto-flow: automatic machine learning workflow modeling platform
  • scikit-optimize/Scikit-Optimize: Scikit-Optimize, or skopt, is a simple and efficient library to minimize (very) expensive and noisy black-box functions. It implements several methods for sequential model-based optimization. skopt aims to be accessible and easy to use in many contexts
  • cod3licious/autofeat: Linear Prediction Models with Automated Feature Engineering and Selection
  • [S]Alex-Lekov/AutoML_Alex: State-of-the art Automated Machine Learning python library for Tabular Data
  • joeddav/DEvol: DEvol (DeepEvolution) is a basic proof of concept for genetic architecture search in Keras. The current setup is designed for classification problems, though this could be extended to include any other output type as well.:DEvol (DeepEvolution) is a basic proof of concept for genetic architecture search in Keras. The current setup is designed for classification problems, though this could be extended to include any other output type as well.
  • AutoViML/auto_ts: auto-ts is an Automated ML library for time series data. auto-ts enables you to build and select multiple time series models using techniques such as ARIMA, SARIMAX, VAR, decomposable (trend+seasonality+holidays) models, and ensemble machine learning models.
  • gfluz94/aautoml-gfluz: This is a library developed to incorporate useful properties and methods in relevant data science packages, such as scikit-learn and pycaret, in order to provide a pipeline which suits every supervised problem. Therefore, data scientists can spend less time working on building pipelines and use this time more wisely to create new features and tune the best model.
  • societe-generale/aikit: Automatic Tool Kit for Machine Learning and Datascience. The objective is to provide tools to ease the repetitive part of the DataScientist job and so that he/she can focus on modelization. This package is still in alpha and more features will be added
  • SoftwareAG/mlw: ML Workbench is an open source machine learning and artificial intelligence platform for Data Scientist to solve business problems faster and quicker, build prototypes and convert them to actual project. The modeler helps from data preparation to model building and deployment, the tool supports a large variety of algorithms that can be run without a single line of code. The web based tool has various components which help Data Scientist of different skill levels to perfrom several model building tasks and provides deployment ready PMML files which can be hosted as a REST services. ML Workbench allows it's user to cover a wide variety of algorithms and Deep Neural Network architectures, with minimal or No code enviornment. It is also one of the few deep-learning platforms to support the Predictive Model Markup Languaue (PMML) format, PMML allows for different statistical and data mining tools to speak the same language.
  • souryadey/deep-n-cheap: This repository implements Deep-n-Cheap – an AutoML framework to search for deep learning models
  • deil87/automl-genetic: Here we are trying to employ evolutionary algorithms and concepts to search the space of classifiers. In particularly we are interested in automatic construction of ensembles of classifiers because nowadays they have proved to be very efficient
  • CleverInsight/cognito: Cognito is an exclusive python data preprocessing library and command line utility that helps any developer to transform raw data into a machine-learning format. We at CleverInsight Open Ai Foundation took the initiative to build a better automated data preprocessing library and here it is
  • kxsystems/automl: The automated machine learning library described here is built largely on the tools available within the machine learning toolkit available here. The purpose of this framework is to provide users with the ability to automate the process of applying machine learning techniques to real-world problems. In the absence of expert machine learning engineers this handles the following processes within a traditional workflow
  • Media-Smart/volkstuner: volkstuner is an open source hyperparameter tuner.
  • mihaianton/automl: An automated Machine Learning pipeline for faster Data Science projects. Using Evolutionary Algorithms for Neural Architecture Search and State-Of-The-Art data engineering techniques towards building an off the box machine learning solution
  • epeters3/skplumber: An AutoML tool and lightweight ML framework for Scikit-Learn.Making the best use of your compute-resources - Whether you are using a couple of GPUs or AWS, Auptimizer will help you orchestrate compute resources for faster hyperparameter tuning.3)Getting the best models in minimum time - Generate optimal models and achieve better performance by employing state-of-the-art HPO techniques. Auptimizer provides a single seamless access point to top-notch HPO algorithms, including Bayesian optimization, multi-armed bandit. You can even integrate your own proprietary solution.
  • tristandeleu/pytorch-meta: A collection of extensions and data-loaders for few-shot learning & meta-learning in PyTorch. Torchmeta contains popular meta-learning benchmarks, fully compatible with both torchvision and PyTorch's DataLoader
  • learnables/learn2learn: PyTorch Meta-learning Library for Researchers
  • dragonfly/Dragonfly: An open source python library for scalable Bayesian optimisation.
  • starlibs/AILibs:AILibs is a modular collection of Java libraries related to automated decision making. It's highlight functionalities are:1)Graph Search (jaicore-search): (AStar, BestFirst, Branch & Bound, DFS, MCTS, and more);2)Logic (jaicore-logic): Represent and reason about propositional and simple first order logic formulas;3)Planning (jaicore-planning): State-space planning (STRIPS, PDDL), and hierarchical planning (HTN, ITN, PTN);4)Reproducible Experiments (jaicore-experiments): Design and efficiently conduct experiments in a highly parallelized manner.;5)Automated Software Configuration (HASCO): Hierarchical configuration of software systems.;6)Automated Machine Learning (ML-Plan): Automatically find optimal machine learning pipelines in WEKA or sklearn
  • societe-generale/aikit:Automatic Tool Kit for Machine Learning and Datascience. The objective is to provide tools to ease the repetitive part of the DataScientist job and so that he/she can focus on modelization. This package is still in alpha and more features will be added. Its mains features are:1)improved and new "scikit-learn like" transformers ;2)GraphPipeline : an extension of sklearn Pipeline that handles more generic chains of tranformations ;3)an AutoML to automatically search throught several transformers and models.
  • PGijsbers/gama:GAMA is an AutoML package for end-users and AutoML researchers. It generates optimized machine learning pipelines given specific input data and resource constraints. A machine learning pipeline contains data preprocessing (e.g. PCA, normalization) as well as a machine learning algorithm (e.g. Logistic Regression, Random Forests), with fine-tuned hyperparameter settings (e.g. number of trees in a Random Forest).To find these pipelines, multiple search procedures have been implemented. GAMA can also combine multiple tuned machine learning pipelines together into an ensemble, which on average should help model performance. At the moment, GAMA is restricted to classification and regression problems on tabular data. In addition to its general use AutoML functionality, GAMA aims to serve AutoML researchers as well. During the optimization process, GAMA keeps an extensive log of progress made. Using this log, insight can be obtained on the behaviour of the search procedure.
  • [S]BartekPog/modelcreator: Simple python package for creating predictive models.This package contains a Machine which is meant to do the learning for you. It can automaticly create a fitting predictive model for given data
  • microsoft/EconML:EconML is a Python package for estimating heterogeneous treatment effects from observational data via machine learning. This package was designed and built as part of the ALICE project at Microsoft Research with the goal to combine state-of-the-art machine learning techniques with econometrics to bring automation to complex causal inference problems. The promise of EconML:1)Implement recent techniques in the literature at the intersection of econometrics and machine learning;2)Maintain flexibility in modeling the effect heterogeneity (via techniques such as random forests, boosting, lasso and neural nets), while preserving the causal interpretation of the learned model and often offering valid confidence intervals;3)Use a unified API;4)Build on standard Python packages for Machine Learning and Data Analysis.
  • 【Commercial】AutoCross: 第四范式
  • Yelp/MOE: MOE (Metric Optimization Engine) is an efficient way to optimize a system's parameters, when evaluating parameters is time-consuming or expensive
  • flytxtds/AutoGBT: AutoGBT stands for Automatically Optimized Gradient Boosting Trees, and is used for AutoML in a lifelong machine learning setting to classify large volume high cardinality data streams under concept-drift. AutoGBT was developed by a joint team ('autodidact.ai') from Flytxt, Indian Institute of Technology Delhi and CSIR-CEERI as a part of NIPS 2018 AutoML Challenge (The 3rd AutoML Challenge: AutoML for Lifelong Machine Learning).
  • MainRo/xgbtune: XGBTune is a library for automated XGBoost model tuning. Tuning an XGBoost model is as simple as a single function call.
  • autonomio/talos: alos radically changes the ordinary Keras workflow by fully automating hyperparameter tuning and model evaluation. Talos exposes Keras functionality entirely and there is no new syntax or templates to learn.
  • HunterMcGushion/hyperparameter_hunter: Automatically save and learn from Experiment results, leading to long-term, persistent optimization that remembers all your tests. HyperparameterHunter provides a wrapper for machine learning algorithms that saves all the important data. Simplify the experimentation and hyperparameter tuning process by letting HyperparameterHunter do the hard work of recording, organizing, and learning from your tests — all while using the same libraries you already do. Don't let any of your experiments go to waste, and start doing hyperparameter optimization the way it was meant to be
  • ja-thomas/autoxgboost:autoxgboost aims to find an optimal xgboost model automatically using the machine learning framework mlr and the bayesian optimization framework mlrMBO.
  • ScottfreeLLC/AlphaPy: AlphaPy is a machine learning framework for both speculators and data scientists. It is written in Python with the scikit-learn, pandas, and Keras libraries, as well as many other helpful libraries for feature engineering and visualization
  • gdikov/hypertunity: A toolset for black-box hyperparameter optimisation.
  • laic-ufmg/recipe: Automated machine learning (AutoML) with grammar-based genetic programming
  • thomas-young-2013/alpha-ml: Alpha-ML is a high-level AutoML toolkit, written in Python
  • produvia/ai-platform:AI Platform aims to automate AI R&D tasks. Our vision is to create machine learning models to solve various computer science tasks. Our mission is to achieve automation of AI technologies.We are developing service-centered or task-focused machine learning models. These models, or AI services, solve distinct tasks or functions.Examples of AI tasks include:1)semantic segmentation (computer visions);2)machine translation (natural language processing);3)word embeddings (methodology);4)recommendation systems (miscellaneous);5)speech recognition (speech);6)atari games (playing games);7)link prediction (graphs);8)time series classification (time series);9)audio generation (audio);10)visual odometry (robots);11)music information retrieval (music);12)dimensionality reduction (computer code);13)decision making (reasoning);14)knowledge graphs (knowledge base);15)adversarial attack (adversarial).
  • wywongbd/autocluster: autocluster is an automated machine learning (AutoML) toolkit for performing clustering tasks
  • ksachdeva/scikit-nni: Microsoft NNI (Neural Network Intelligence) is a toolkit to help users run automated machine learning (AutoML) experiments. The tool dispatches and runs trial jobs generated by tuning algorithms to search the best neural architecture and/or hyper-parameters in different environments like local machine, remote servers and cloud
  • SaltWaterStudio/modgen: This program was created for rapid feature engineering without the need to optimize each model. Modgen is designed to develop a quick overview of how your updated features will react to each model. You can use one specific algorithm or a wide variety (depending on your interests) with a random feature range which can be easily changed at anytime by the user
  • gomerudo/automl: The Automated Machine Learning process is intented to automatically discover well performant pipelines that solve a machine learning problem such as classification or regression
  • crawles/automl_service: Deploy automated machine learning (AutoML) as a service using Flask, for both pipeline training and pipeline serving. The framework implements a fully automated time series classification pipeline, automating both feature engineering and model selection and optimization using Python libraries, TPOT and tsfresh
  • georgianpartners/foreshadow: Foreshadow is an automatic pipeline generation tool that makes creating, iterating, and evaluating machine learning pipelines a fast and intuitive experience allowing data scientists to spend more time on data science and less time on code
  • ypeleg/HungaBunga: Brute Force all scikit-learn models and all scikit-learn parameters with fit predict
  • onepanelio/automl: Onepanel's AutoML framework was built to improve the accuracy of your machine learning models and make them more accessible by automatically creating a data analysis pipeline that can include data pre-processing, feature selection, and feature engineering methods along with machine learning methods and parameter settings that are optimized for your data
  • accurat/ackeras: AutoML library for Accurat, based on AutoKeras and Scikit-Learn
  • bhat-prashant/reinforceML: A handy Data Science Assistant for beginners and exerts alike. ReinforceML is a Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming and reinforcement learning
  • reiinakano/Xcessive: A web-based application for quick, scalable, and automated hyperparameter tuning and stacked ensembling in Python
  • minimaxir/automl-gs: automl-gs is an AutoML tool which, unlike Microsoft's NNI, Uber's Ludwig, and TPOT, offers a zero code/model definition interface to getting an optimized model and data transformation pipeline in multiple popular ML/DL frameworks, with minimal Python dependencies (pandas + scikit-learn + your framework of choice). automl-gs is designed for citizen data scientists and engineers without a deep statistical background under the philosophy that you don't need to know any modern data preprocessing and machine learning engineering techniques to create a powerful prediction workflow
  • cc-hpc-itwm/PHS: phs is an ergonomic framework for performing hyperparameter searches on numerous cumpute instances of any arbitrary python function. This is achieved with minimal modifications inside your target function. Possible applications appear in expensive to evaluate numerical computations which strongly depend on hyperparameters such as machine learning
  • tobegit3hub/advisor: Advisor is the hyper parameters tuning system for black box optimization
  • HIPS/Spearmint: Spearmint is a software package to perform Bayesian optimization. The Software is designed to automatically run experiments (thus the code name spearmint) in a manner that iteratively adjusts a number of parameters so as to minimize some objective in as few runs as possible
  • claesenm/Optunity: Optunity is a library containing various optimizers for hyperparameter tuning. Hyperparameter tuning is a recurrent problem in many machine learning tasks, both supervised and unsupervised. Tuning examples include optimizing regularization or kernel parameters
  • cmccarthy1/automl:The automated machine learning library described here is built largely on the tools available within the machine learning toolkit. The purpose of this framework is to provide users with the ability to automate the process of applying machine learning techniques to real-world problems. In the absence of expert machine learning engineers this handles the following processes within a traditional workflow
  • zygmuntz/HyperBand: The goal is to provide a fully functional implementation of Hyperband, as well as a number of ready to use functions for a number of models (classifiers and regressors)
  • ClimbsRocks/auto_ml:Automates the whole machine learning process, making it super easy to use for both analytics, and getting real-time predictions in production.A quick overview of buzzwords, this project automates:1)Analytics (pass in data, and auto_ml will tell you the relationship of each variable to what it is you're trying to predict).2)Feature Engineering (particularly around dates, and NLP).3)Robust Scaling (turning all values into their scaled versions between the range of 0 and 1, in a way that is robust to outliers, and works with sparse data).4)Feature Selection (picking only the features that actually prove useful).5)Data formatting (turning a DataFrame or a list of dictionaries into a sparse matrix, one-hot encoding categorical variables, taking the natural log of y for regression problems, etc).6)Model Selection (which model works best for your problem- we try roughly a dozen apiece for classification and regression problems, including favorites like XGBoost if it's installed on your machine).7)Hyperparameter Optimization (what hyperparameters work best for that model).8)Big Data (feed it lots of data- it's fairly efficient with resources).9)Unicorns (you could conceivably train it to predict what is a unicorn and what is not).10)Ice Cream (mmm, tasty...).11)Hugs (this makes it much easier to do your job, hopefully leaving you more time to hug those those you care about).
  • jgreenemi/Parris:Parris is a tool for automating the training of machine learning algorithms. If you're the kind of person that works on ML algorithms and spends too much time setting up a server to run it on, having to log into it to monitor its progress, etc., then you will find this tool helpful. No need to SSH into instances to get your training jobs done
  • ziyuw/rembo: Bayesian optimization in high-dimensions via random embedding.
  • kootenpv/xtoy:Automated Machine Learning: go from 'X' to 'y' without effort
  • jesse-toftum/cash_ml:Automates the whole machine learning process, making it super easy to use for both analytics, and getting real-time predictions in production
  • CCQC/PES-Learn:PES-Learn is a Python library designed to fit system-specific Born-Oppenheimer potential energy surfaces using modern machine learning models. PES-Learn assists in generating datasets, and features Gaussian process and neural network model optimization routines. The goal is to provide high-performance models for a given dataset without requiring user expertise in machine learning.
  • AlexIoannides/ml-workflow-automation:Python Machine Learning (ML) project that demonstrates the archetypal ML workflow within a Jupyter notebook, with automated model deployment as a RESTful service on Kubernetes.
  • yeticloud/dama:a simplified machine learning container platform that helps teams get started with an automated workflow
  • lai-bluejay/diego:Diego: Data in, IntElliGence Out. A fast framework that supports the rapid construction of automated learning tasks. Simply create an automated learning study (Study) and generate correlated trials (Trial). Then run the code and get a machine learning model. Implemented using Scikit-learn API glossary, using Bayesian optimization and genetic algorithm.
  • 【Commercial】DarwinML: 探智立方
  • 【Commercial】Cloud AutoML:
  • 【Commercial】MateLabs:
  • 【Commercial】DataRobot: Learn from an all-star lineup of expert speakers how to best leverage AI today to build business resilience, reduce costs, and speed time to results
  • mb706/automlr: automlr is an R-package for automatically configuring mlr machine learning algorithms so that they perform well. It is designed for simplicity of use and able to run with minimal user intervention
  • XanderHorn/autoML:Automated machine learning in R
  • DataSystemsGroupUT/SmartML: SmartML is an R-Package representing a meta learning-based framework for automated selection and hyperparameter tuning for machine learning algorithms. Being meta-learning based, the framework is able to simulate the role of the machine learning expert. In particular, the framework is equipped with a continuously updated knowledge base that stores information about the meta-features of all processed datasets along with the associated performance of the different classifiers and their tuned parameters. Thus, for any new dataset, SmartML automatically extracts its meta features and searches its knowledge base for the best performing algorithm to start its optimization process. In addition, SmartML makes use of the new runs to continuously enrich its knowledge base to improve its performance and robustness for future runs
  • PaddlePaddle/AutoDL:The open-sourced AutoDl Design is one implementation of AutoDL technique.
  • linxihui/lazyML: An R package aims to automatically select models and tune parameters, built upon the popular package caret. The main function mpTune can tune hyper-parameters of a list of models simultaneously with parallel support. It also has functionality to give an unbiased performance estimate of the mpTune procedure. Currently, classification, regression and survival models are supported.
  • darvis-ai/Brainless:Automated Machine Learning Library Using Random Search and Cash Technique.
  • r-tensorflow/autokeras: AutoKeras is an open source software library for automated machine learning (AutoML). It is developed by DATA Lab at Texas A&M University and community contributors. The ultimate goal of AutoML is to provide easily accessible deep learning tools to domain experts with limited data science or machine learning background. AutoKeras provides functions to automatically search for architecture and hyperparameters of deep learning models.
  • IBM/AutoMLPipline.jl: is a package that makes it trivial to create complex ML pipeline structures using simple expressions. It leverages on the built-in macro programming features of Julia to symbolically process, manipulate pipeline expressions, and makes it easy to discover optimal structures for machine learning prediction and classification.
  • SciML/ModelingToolkit.jl:A modeling framework for automatically parallelized scientific machine learning (SciML) in Julia. A computer algebra system for integrated symbolics for physics-informed machine learning and automated transformations of differential equations
  • SciML/DataDrivenDiffEq.jl:Data driven modeling and automated discovery of dynamical systems for the SciML Scientific Machine Learning organization
  • ClimbsRocks/machineJS:a fully-featured default process for machine learning- all the parts are here and have functional default values in place. Modify to your heart's delight so you can focus on the important parts for your dataset, or run it all the way through with the default values to have fully automated machine learning
  • automl-js/automl-js:Automated Machine Learning, done locally in browser or on a server with nodejs. Ground up implementation of ML algorithms for both regression and classification, such as Decision Trees, Linear Models and Gradient Boosting with Decision Trees. The implementation is benchmarked against excellent scikit-learn library to give quite close, albeit somewhat smaller (at most 1 percent of classification accuracy on average) score.
  • duckladydinh/KotlinML
  • [paper]AutoStacker:
  • [paper]AlphaD3M:
  • [paper]VDS:
  • [paper]ExploreKit:

Distributed Frameworks

  • intel-analytics/analytics-zoo: Analytics Zoo seamless scales TensorFlow, Keras and PyTorch to distributed big data (using Spark, Flink & Ray).
  • databricks/automl-toolkit: This package provides a number of different levels of API interaction, from the highest-level "default only" FamilyRunner to low-level APIs that allow for highly customizable workflows to be created for automated ML tuning and Inference
  • salesforce/TransmogrifAI: TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library written in Scala that runs on top of Apache Spark. It was developed with a focus on accelerating machine learning developer productivity through machine learning automation, and an API that enforces compile-time type-safety, modularity, and reuse. Through automation, it achieves accuracies close to hand-tuned models with almost 100x reduction in time.
  • hyperopt/Hyperopt: Distributed Asynchronous Hyperparameter Optimization in Python, for serial and parallel optimization over awkward search spaces, which may include real-valued, discrete, and conditional dimensions.
  • nusdbsystem/singa-auto:SINGA-AUTO is a distributed system that trains machine learning (ML) models and deploys trained models, built with ease-of-use in mind. To do so, it leverages on automated machine learning (AutoML).
  • DataSystemsGroupUT/D-SmartML: An automated Machine Learning pipeline for faster Data Science projects. Using Evolutionary Algorithms for Neural Architecture Search and State-Of-The-Art data engineering techniques towards building an off the box machine learning solution.
  • UCBerkeley/MLBase: Implementing and consuming Machine Learning at scale are difficult tasks. MLbase is a platform addressing both issues, and consists of three components -- MLlib, MLI, ML Optimizer. 1)ML Optimizer: This layer aims to automating the task of ML pipeline construction. The optimizer solves a search problem over feature extractors and ML algorithms included in MLI and MLlib. The ML Optimizer is currently under active development.2)MLI: An experimental API for feature extraction and algorithm development that introduces high-level ML programming abstractions. A prototype of MLI has been implemented against Spark, and serves as a testbed for MLlib.3)MLlib: Apache Spark's distributed ML library. MLlib was initially developed as part of the MLbase project, and the library is currently supported by the Spark community. Many features in MLlib have been borrowed from ML Optimizer and MLI, e.g., the model and algorithm APIs, multimodel training, sparse data support, design of local / distributed matrices, etc.
  • 【Commercial】Databricks/AutoML: The library receive dataset as an input and produce an optimized model as an output. The library extracts some characteristics of the datasets and use an internal knowledgebase to determine the best algorithm, then use a hyperband method to find the best hyper parameters for the selected algorithm.
  • AxeldeRomblay/MLBox: is another AutoML library and it supports distributed data processing, cleaning, formatting, and state-of-the-art algorithms such as LightGBM and XGBoost. It also supports model stacking, which allows you to combine an information ensemble of models to generate a new model aiming to have better performance than the individual models.
  • HDI-Project/ATM: Auto Tune Models (ATM) is an AutoML system designed with ease of use in mind. In short, you give ATM a classification problem and a dataset as a CSV file, and ATM will try to build the best model it can. ATM is based on a paper of the same name, and the project is part of the Human-Data Interaction (HDI) Project at MIT.
  • HDI-Project/ATMSeer: ATMSeer is an interactive visualization tool for automated machine learning (AutoML). It supports users to monitor an ongoing AutoML process, analyze the searched models, and refine the search space in real-time through a multi-granularity visualization. In this instantiation, we build on top of the ATM AutoML system
  • logicalclocks/maggy: Maggy is a framework for efficient asynchronous optimization of expensive black-box functions on top of Apache Spark. Compared to existing frameworks, maggy is not bound to stage based optimization algorithms and therefore it is able to make extensive use of early stopping in order to achieve efficient resource utilization.
  • automl/HpBandSter:a distributed Hyperband implementation on Steroids
  • giantcroc/featuretoolsOnSpark: Featuretools is a python library for automated feature engineering. This repo is a simplified version of featuretools,using automatic feature generation framework of featuretools.Instead of the fussy back-end architecture of featuretools,We mainly use Spark DataFrame to achieve faster feature generation process(speed up 10x+)
  • automl/bohb: a distributed Hyperband implementation on Steroids. This python 3 package is a framework for distributed hyperparameter optimization. It started out as a simple implementation of Hyperband (Li et al. 2017), and contains an implementation of BOHB (Falkner et al. 2018)
  • ray-project/ray: A fast and simple framework for building and running distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library
  • tqichun/distributed-SMAC3: Distributed Sequential Model-based Algorithm Configuration, forked by https://github.com/automl/SMAC3 This package is a re-implementation of the original SMAC tool (see reference below). However, the reimplementation slightly differs from the original SMAC. For comparisons against the original SMAC, we refer to a stable release of SMAC (v2) in Java which can be found here.
  • ccnt-glaucus/glaucus: Glaucus is a Data Flow based machine learning suite that incorporates Automated machine learning pipeline, Simplified the complex processes of machine learning algorithms and applying Excellent distributed data-processing engines. For the non-data science professionals across the domain, help them get the benefits of powerful machine learning tools by a simple way.Our platform integrates many excellent data processing engines including Spark, Tensorflow, Scikit-learn, and we established a set of easy-to-use design process bases on them. The user only need to upload data, simple configuration, algorithm selection, and train the algorithm by automatic or manual parameter adjustment. The platform also provides a wealth of evaluation indicators for the training model, so that non-professionals can maximize the role of machine learning in their fields.
  • pierre-chaville/automlk: Automated and distributed machine learning toolkit. This toolkit is designed to be integrated within a python project, but also independently through the interface of the app
  • takezoe/predictionio-template-automl: This is a Apache PredictionIO engine template which offers AutoML capability using TransmogrifAI.You can launch a prediction WebAPI service without any coding
  • nginyc/rafiki: Rafiki is a distributed system that trains machine learning (ML) models and deploys trained models, built with ease-of-use in mind. To do so, it leverages on automated machine learning (AutoML).

Projects

benchmark


Get A Weekly Email With Trending Projects For These Topics
No Spam. Unsubscribe easily at any time.
automl (93
automated-machine-learning (25

Find Open Source By Browsing 7,000 Topics Across 59 Categories