cover

40. A Review of Automated Machine Learning#

40.1. Introduction#

Automated machine learning (AutoML) is a popular branch that has emerged in machine learning in recent years. It can be regarded as an artificial intelligence-based solution to meet the growing needs of machine learning application scenarios. Automated machine learning can, to a certain extent, lower the development threshold of machine learning models. A series of processes such as algorithm selection, training, tuning, and deployment can be handed over to automated components to complete.

40.2. Key Points#

  • Concepts of Automated Machine Learning

  • Goals of Automated Machine Learning

40.3. Overview of Automated Machine Learning#

Automated machine learning (AutoML) is an area that has emerged in machine learning in recent years. The earlier idea can be found in the ACM paper Auto-WEKA. You might think that machine learning itself already has some characteristics of automation and intelligence. So, where exactly does the automation in AutoML lie?

In fact, automated machine learning is the process of automating the end-to-end process of applying machine learning to real-world problems. In a typical machine learning process, developers must learn data preprocessing, feature engineering, feature extraction, and feature selection methods to make the dataset suitable for machine learning. After these data preprocessing steps, developers must select appropriate algorithms and complete the selection of hyperparameters and optimization methods.

In fact, many of the above steps often go beyond the capabilities of non-experts. Therefore, automated machine learning can be regarded as an artificial intelligence-based solution to meet the growing needs of machine learning application scenarios. Among them, the process of completing automated machine learning is handed over to tools, and it is not necessary for every developer to have perfect machine learning development capabilities.

The following is the working principle diagram of AutoML drawn by Google Cloud:

Simply put, developers only need to provide data, such as images of different categories. Next, a series of processes such as algorithm selection, algorithm training, parameter tuning, and model deployment can be handed over to the AutoML component to complete. You should find that this will greatly lower the development threshold of machine learning models.

In fact, there is no clear goal classification for the objectives of automated machine learning. We synthesized the results of some research papers and summarized the following 4 key objective directions for automated machine learning.

40.4. Automated Feature Engineering#

Automated Feature Engineering, abbreviated as Auto FE, mainly includes operations such as feature selection, feature extraction, meta-learning, and detecting and handling imbalanced or missing data.

Among them, feature selection (English: Feature selection) is also known as variable selection, attribute selection, or variable subset selection. It refers to the process of selecting a relevant subset of features for model construction. Feature extraction can be regarded as a step to reduce the dimensionality of data. The initial data set is reduced to a more manageable group (features) for learning, while maintaining the accuracy and integrity of describing the original data set. This includes methods such as principal component analysis and independent component analysis.

Meta learning is a subfield of machine learning that mainly addresses the problem of learning how to learn. Currently, training machine learning models is based on large-scale data, while in meta learning, we can build a subsystem for learning from experience. In fact, this is actually mimicking the way humans learn. For example, humans can distinguish between houses and cars without actually looking at a large amount of data to acquire this ability, which is similar to the fast learning ability obtained from experience. Meta learning is still a newly emerging field. You can read An introduction to Meta-learning to understand the process of meta learning, or read more related research papers.

40.5. Automated Model Selection#

Automated Model Selection, abbreviated as AMS, aims to select the most suitable machine learning algorithm model based on data characteristics, just as the name implies. In traditional machine learning, the selection of a model is generally determined by machine learning experts based on experience and the results of cross-validation. Of course, during the model selection process, data sampling and model evaluation methods are worthy of consideration, and the large computational cost or time required for cross-validation also deserves further improvement and research.

40.6. Hyperparameter Automatic Optimization#

Hyperparameter Automatic Optimization is called Hyperparameter Optimization, abbreviated as HPO. In fact, the above-mentioned automated model selection and hyperparameter automatic optimization should be complementary processes. For example, most machine learning algorithms involve hyperparameters, and different hyperparameters can actually be regarded as different model structures in the face of the same algorithm. Therefore, more often, what is mentioned and studied in automated machine learning is hyperparameter automatic optimization rather than the concept of automated model selection.

Regarding hyperparameter optimization, there is a wealth of research and progress. It can be roughly divided into these methods: Bayesian Optimization, Evolutionary Algorithms, Based on Lipschitz Functions, Local Search, Random Search, Particle Swarm Optimization, Meta Learning, Transfer Learning, etc.

If you are interested, you can further read the paper links given above.

40.8. Automated Machine Learning Approaches#

Currently, there are mainly two ways to learn and implement automated machine learning tools, namely:

  • Open-source frameworks: For example, completed locally with open-source tools such as Auto-Keras and auto-sklearn.

  • Commercial services: For example, completed in the cloud with tools from cloud service providers such as Google Cloud and Microsoft Azure.

Of course, both have their own advantages and disadvantages. Open-source frameworks offer a high degree of customization and are convenient for integration, but they require strong local computing power support. Commercial services have low requirements for the local environment and come with complete development documentation and technical support.

Below, we present a list of open-source frameworks and commercial services in the area of automated machine learning. Source

Name

Supported Type

Programming Language

Open Source License

Official Website

AdaNet

NAS

Python

Apache-2.0

Github

Advisor

HPO

Python

Apache-2.0

Github

AMLA

HPO, NAS

Python

Apache-2.0

Github

ATM

HPO

Python

MIT

Github

Auger

HPO

Python

Commercial

Homepage

Auto-Keras

NAS

Python

License

Github

AutoML Vision

NAS

Python

Commercial

Homepage

AutoML Video Intelligence

NAS

Python

Commercial

Homepage

AutoML Natural Language

NAS

Python

Commercial

Homepage

AutoML Translation

NAS

Python

Commercial

Homepage

AutoML Tables

AutoFE, HPO

Python

Commercial

Homepage

auto-sklearn

HPO

Python

License

Github

auto_ml

HPO

Python

MIT

Github

BayesianOptimization

HPO

Python

MIT

Github

BayesOpt

HPO

C++

AGPL-3.0

Github

comet

HPO

Python

Commercial

Homepage

DataRobot

HPO

Python

Commercial

Homepage

DEvol

NAS

Python

MIT

Github

Driverless AI

AutoFE

Python

Commercial

Homepage

FAR-HO

HPO

Python

MIT

Github

H2O AutoML

HPO

Python, R, Java, Scala

Apache-2.0

Github

HpBandSter

HPO

Python

BSD-3-Clause

Github

HyperBand

HPO

Python

License

Github

Hyperopt

HPO

Python

License

Github

Hyperopt-sklearn

HPO

Python

License

Github

Hyperparameter Hunter

HPO

Python

MIT

Github

Katib

HPO

Python

Apache-2.0

Github

MateLabs

HPO

Python

Commercial

Github

Milano

HPO

Python

Apache-2.0

Github

MLJAR

HPO

Python

Commercial

Homepage

nasbot

NAS

Python

MIT

Github

neptune

HPO

Python

Commercial

Homepage

NNI

HPO, NAS

Python

MIT

Github

Optunity

HPO

Python

License

Github

R2.ai

HPO

——

Commercial

Homepage

RBFOpt

HPO

Python

License

Github

RoBO

HPO

Python

BSD-3-Clause

Github

Scikit-Optimize

HPO

Python

License

Github

SigOpt

HPO

Python

Commercial

Homepage

SMAC3

HPO

Python

License

Github

TPOT

AutoFE, HPO

Python

LGPL-3.0

Github

TransmogrifAI

HPO

Scala

BSD-3-Clause

Github

Tune

HPO

Python

Apache-2.0

Github

Xcessiv

HPO

Python

Apache-2.0

Github

SmartML

HPO

R

GPL-3.0

Github

Among them, the well-maintained open-source projects are:

  • auto-sklearn: An automated machine learning framework developed based on scikit-learn and maintained by the automl.org organization.

  • Auto-Keras: An automated deep learning framework officially maintained by the Keras deep learning framework.

  • NNI: An automated deep learning framework officially maintained by Microsoft and supporting mainstream deep learning frameworks such as TensorFlow and PyTorch.

Cloud service providers such as Google Cloud and AWS are excellent. Domestic cloud providers such as Tencent Cloud, Alibaba Cloud, and Baidu Cloud also support optimization goals such as NAS and HPO. Unfortunately, Google Cloud does not currently provide services in mainland China.

In addition, commercial companies such as H2O.ai, Transwarp, and 4paradigm also provide relatively good automated machine learning platform services, but most of them are oriented towards enterprises.

40.9. Summary#

In this experiment, we introduced automated machine learning starting from the concept of machine learning. Among them, we focused on sorting out and introducing the goals involved in automated machine learning, such as automated feature engineering, automated model selection, hyperparameter auto-optimization, and neural architecture search. Since automated machine learning itself is at the forefront and a popular field of current academic research, some parts of its knowledge system are not yet clear. In the subsequent experimental content, we will start from automated machine learning tools to conduct practical applications of automated machine learning.

Related Links


○ Sharethis article link to your social media, blog, forum, etc. More external links will increase the search engine ranking of this site.