Pro Membership

Pro Membership is a membership plan created by the author to maintain and update this tutorial. You can get more benefits and services, click to view details.

46. Online Learning and Cloud Model Deployment#

46.1. Introduction#

This challenge focuses on the online incremental learning of the model. You are required to complete the online deployment of the model as required and enable it to perform inference and incremental learning on new data.

46.2. Key Points#

Online incremental learning
Cloud deployment of the model

The previous challenge completed the Mushroom Classification task and finally built a web service for model inference using Flask. In this challenge, still using the mushroom classification dataset, you need to build a Flask service for online incremental learning according to the following challenge requirements and finally deploy it to the Render cloud service.

At the end of the machine learning model inference and deployment experiment, we provided a cloud API link for Titanic survival prediction, through which you can obtain the prediction results in real time.

                          import requests

# 向服务器发送请求获得预测结果
sample = [
    {"pclass": 1, "sex": "male", "embarked": "C"},
    {"pclass": 2, "sex": "female", "embarked": "S"},
    {"pclass": 3, "sex": "male", "embarked": "Q"},
    {"pclass": 3, "sex": "female", "embarked": "S"},
]

# 稍等片刻，Render 线上服务存在冷却启动时间
requests.post(url="https://titanic-demo.onrender.com", json=sample).content

                        

b'{"predict":["no","yes","no","no"]}\n'

In fact, the experiment hosted the code in the Render service. Render is a PaaS (Platform as a Service) that supports multiple programming languages, and you can register for a free account to use it.

                      {exercise-start}
:label: chapter05_07_1

Challenge: Modify the code of the previous mushroom classification challenge to meet the requirements of this challenge and deploy it to the Render service.

Hint: The Render website can be accessed normally on the Chinese mainland, but a “scientific Internet access” environment may be required for the registration process.

                      {exercise-end}

                    

The requirements that the challenge needs to meet are as follows:

Be able to perform inference on the input sample data and return the inferred category and the probability of belonging to the category.
If the predicted category probability of the input sample data is greater than 80%, use this sample and the predicted category to perform incremental training on the model. Otherwise, ignore it.
Call the model saved by incremental training during the next inference process.

Try to imitate the source code of the Titanic task deployed based on Render in the experiment https://github.com/huhuhang/titanic-demo, deploy the trained model to the Render service, and finally support sending POST requests from the local to the cloud to obtain inference results.

Among them, in the experiment, the code was placed in a Github repository and the corresponding repository was bound through Render to complete the deployment. You need to imitate the corresponding configuration files in the repository. Regarding the use of developer tools such as Github and Render, please make full use of the corresponding official documents and search engines. You need to have a certain self-study ability and exploration spirit.

Dataset download address:

                      https://cdn.aibydoing.com/aibydoing/files/mushrooms.csv  # Copy the link and paste it into the browser to download

                    

Challenge Test Instructions

This challenge is recommended to be completed offline. At the same time, you need to use scikit-learn to train and save the model, and finally use Flask to complete the construction of the Web application. After starting Flask, you can initiate a POST request to the Render deployment link locally to obtain the inference result. When testing, it is recommended to use the samples in the original dataset, and the input data needs to be in JSON format.

Sample test code is as follows:

                          wget -nc https://cdn.aibydoing.com/aibydoing/files/mushrooms_test.csv

                        

                          import json
import pandas as pd

df = pd.read_csv("mushrooms_test.csv")  # 读取测试数据集
sample_data = df.sample(1).to_json()  # 从原数据中随机取 1 条用于测试推理，并转换成 JSON 样式
sample_json = json.loads(sample_data)  # 将 Pandas 转换的 JSON 样式数据处理成 JSON 类型

# 建立 POST 请求，并发送数据请求，请将下方链接替换为你所部署 Render 服务链接
requests.post(url="https://mushrooms-prediction.onrender.com", json=sample_json).content

                        

                          b'<!doctype html>\n<html lang=en>\n<title>500 Internal Server Error</title>\n<h1>Internal Server Error</h1>\n<p>The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.</p>\n'

                        

Expected Output

As shown in the above example, the content that the request should return at least includes: the predicted class prediction of the sample, the probability predict_proba of belonging to the predicted class, and the status of whether incremental training has been performed partial_fit.

Note

This challenge mainly examines the deployment of online models and incremental training methods, which has little practical significance. The reason is that in real scenarios, there will be no such incremental training requirements, and the model cannot simply decide whether to perform incremental training based on the input data because the true sample labels have not been verified. In addition, there is a possibility that the model can be “attacked” with this approach taken in the challenge, rendering the model meaningless.