cover

45. Dynamic Incremental Training of Machine Learning Models#

45.1. Introduction#

Through the previous experimental content, I believe you are already very familiar with saving and deploying models using scikit-learn. In this challenge, you will learn what incremental training is, as well as the deployment and invocation of dynamic incremental models.

45.2. Key Points#

  • Dynamic models

  • Incremental training

  • Real-time handwritten character recognition

45.3. Static Models and Dynamic Models#

In the previous experiment, we learned how to deploy a machine learning model online and perform dynamic inference. In fact, in addition to the dynamic and static nature of the inference process, the training process of machine learning models can also be divided into two categories: dynamic and static.

  • Static models use offline training. Generally, the model is trained only once and then used for a long time.

  • Dynamic models use online training. Data continuously enters the system, and the system is updated continuously to integrate this data into the model.

In the previous experiments, we all adopted the method of offline training and saving static models. In fact, when you deploy a machine learning model online, you may want the model to learn more new data dynamically and be continuously updated.

image

The above process can be understood as follows. Offline training uses a large amount of local data to train the model. At this time, if incremental data is input, the model will continue to learn under the condition of the optimized parameters. The advantage of this is that the model is in a continuous learning process rather than starting from scratch every time.

Of course, the idea is very good. However, not every model supports online (incremental) training, which needs to be determined according to the characteristics of the model itself and the machine learning framework used.

In scikit-learn, the algorithms that support incremental training are:

  • Classification algorithms

    • sklearn.naive_bayes.MultinomialNB

    • sklearn.naive_bayes.BernoulliNB

    • sklearn.linear_model.Perceptron

    • sklearn.linear_model.SGDClassifier

    • sklearn.linear_model.PassiveAggressiveClassifier

    • sklearn.neural_network.MLPClassifier

  • Regression algorithms

    • sklearn.linear_model.SGDRegressor

    • sklearn.linear_model.PassiveAggressiveRegressor

    • sklearn.neural_network.MLPRegressor

Next, we use an artificial neural network to complete the process of dynamic incremental training and deployment of the model. Here, we also choose the DIGITS handwritten character dataset used before. For the needs of the experiment, we replace all values greater than 0 in the handwritten character matrix with 1.

import warnings

warnings.filterwarnings("ignore")
from sklearn.datasets import load_digits

digits = load_digits()  # 加载数据集

digits.data.shape, digits.target.shape
((1797, 64), (1797,))

Then, split the dataset into a training set and a test dataset.

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    digits.data, digits.target, random_state=1, test_size=0.2
)

X_train.shape, X_test.shape, y_train.shape, y_test.shape
((1437, 64), (360, 64), (1437,), (360,))

Next, train the model using the Train data and evaluate it using the test data. Adding verbose=1 in MLPClassifier can output the loss value at each iteration step.

from sklearn.metrics import accuracy_score
from sklearn.neural_network import MLPClassifier

model = MLPClassifier(random_state=1, verbose=1, max_iter=50)

model.fit(X_train, y_train)  # 训练模型
y_pred = model.predict(X_test)  # 测试模型
accuracy_score(y_test, y_pred)  # 准确度
Iteration 1, loss = 7.02205935
Iteration 2, loss = 3.65516147
Iteration 3, loss = 2.47679869
Iteration 4, loss = 1.49613624
Iteration 5, loss = 1.00259484
Iteration 6, loss = 0.72002813
Iteration 7, loss = 0.54341224
Iteration 8, loss = 0.43746627
Iteration 9, loss = 0.36224450
Iteration 10, loss = 0.30940686
Iteration 11, loss = 0.26808400
Iteration 12, loss = 0.23881533
Iteration 13, loss = 0.21317742
Iteration 14, loss = 0.19387023
Iteration 15, loss = 0.17858371
Iteration 16, loss = 0.16540074
Iteration 17, loss = 0.15237040
Iteration 18, loss = 0.14083022
Iteration 19, loss = 0.13015872
Iteration 20, loss = 0.12388636
Iteration 21, loss = 0.11475134
Iteration 22, loss = 0.10716270
Iteration 23, loss = 0.10093849
Iteration 24, loss = 0.09392212
Iteration 25, loss = 0.08891589
Iteration 26, loss = 0.08473752
Iteration 27, loss = 0.08024667
Iteration 28, loss = 0.07630452
Iteration 29, loss = 0.07093241
Iteration 30, loss = 0.06705022
Iteration 31, loss = 0.06426208
Iteration 32, loss = 0.06073862
Iteration 33, loss = 0.05743292
Iteration 34, loss = 0.05524405
Iteration 35, loss = 0.05257737
Iteration 36, loss = 0.04949237
Iteration 37, loss = 0.04771388
Iteration 38, loss = 0.04545686
Iteration 39, loss = 0.04306707
Iteration 40, loss = 0.04101056
Iteration 41, loss = 0.03913876
Iteration 42, loss = 0.03854201
Iteration 43, loss = 0.03717838
Iteration 44, loss = 0.03520881
Iteration 45, loss = 0.03329344
Iteration 46, loss = 0.03247741
Iteration 47, loss = 0.03017486
Iteration 48, loss = 0.02957126
Iteration 49, loss = 0.02897609
Iteration 50, loss = 0.02674436
0.975

It can be seen that the model achieves an accuracy of approximately 98% on the test set. Next, we will find those samples that are mispredicted by the model.

n = 0
for i, (pred, test) in enumerate(zip(y_pred, y_test)):
    if pred != test:
        print("样本索引:", i, "被错误预测为: ", pred, "正确标签为: ", test)
        n += 1
print("总计错误预测样本数量:", n)
样本索引: 21 被错误预测为:  4 正确标签为:  1
样本索引: 58 被错误预测为:  9 正确标签为:  5
样本索引: 88 被错误预测为:  9 正确标签为:  5
样本索引: 173 被错误预测为:  5 正确标签为:  8
样本索引: 208 被错误预测为:  4 正确标签为:  0
样本索引: 281 被错误预测为:  4 正确标签为:  0
样本索引: 321 被错误预测为:  4 正确标签为:  7
样本索引: 347 被错误预测为:  5 正确标签为:  8
样本索引: 348 被错误预测为:  3 正确标签为:  5
总计错误预测样本数量: 9

Now, we can use Matplotlib to plot the mispredicted samples and see if they are easily confused.

from matplotlib import pyplot as plt

%matplotlib inline

plt.imshow(X_test[108].reshape((8, 8)), cmap=plt.cm.gray_r)
<matplotlib.image.AxesImage at 0x14784ba00>
../_images/2c5ecd986d7395bd8264650078069ae5bd21e3a5210119fb5751774e50d8d79a.png

Randomly select and print a few mispredicted samples, and you will find that it is indeed not easy to distinguish even with the human eye.

45.4. Dynamic Incremental Training#

Since there are misprediction results in the trained model, then if we let the model learn these samples and manually tell it the correct results, wouldn’t the model complete incremental training?

In scikit-learn, the method for incremental training is model.partial_fit(X, y), and its usage is identical to that of model.fit(X, y).

Next, we will use the model that has been trained above to perform incremental learning on the mispredicted samples.

import numpy as np

addition_index = []
for i, (pred, test) in enumerate(zip(y_pred, y_test)):
    if pred != test:
        addition_index.append(i)

addition_X = X_test[addition_index]  # 错误预测样本特征
addition_y = y_test[addition_index]  # 错误预测样本正确标签

# 增量训练模型
model.partial_fit(addition_X, addition_y)
model
Iteration 51, loss = 2.24865593
MLPClassifier(max_iter=50, random_state=1, verbose=1)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Next, we use the model again to predict the test data and re-print the mispredicted samples.

y_pred = model.predict(X_test)  # 测试模型
accuracy_score(y_test, y_pred)  # 准确度

# 打印错误预测样本
n = 0
for i, (pred, test) in enumerate(zip(y_pred, y_test)):
    if pred != test:
        print("样本索引:", i, "被错误预测为: ", pred, "正确标签为: ", test)
        n += 1
print("总计错误预测样本数量:", n)
样本索引: 75 被错误预测为:  6 正确标签为:  4
样本索引: 88 被错误预测为:  9 正确标签为:  5
样本索引: 172 被错误预测为:  8 正确标签为:  3
样本索引: 173 被错误预测为:  5 正确标签为:  8
样本索引: 229 被错误预测为:  2 正确标签为:  3
样本索引: 248 被错误预测为:  8 正确标签为:  3
样本索引: 249 被错误预测为:  5 正确标签为:  7
样本索引: 281 被错误预测为:  4 正确标签为:  0
样本索引: 347 被错误预测为:  5 正确标签为:  8
总计错误预测样本数量: 9

It can be seen that the total number of mispredicted samples has decreased. However, some samples still cannot be correctly predicted, and due to the input of incremental learning samples, which causes an overall change in the model parameters, it is also possible that samples that were correctly predicted before are now mispredicted.

Of course, if the total number of mispredicted samples does not decrease, you can repeatedly execute the above two cells multiple times to let the model continuously learn from the mispredicted samples, and you should be able to see a more intuitive effect.

Next, we will complete an interesting process. The experiment intends to build a handwritten character recognition system that can be deployed online to enable the prediction of characters drawn by users.

The experiment has pre-implemented a piece of code that allows you to manually draw a character in the Jupyter Notebook environment. Just run the following cell directly.

from IPython.display import HTML

input_form = """
<table>
<td style="border-style: none;">
<div style="border: solid 2px #666; width: 43px; height: 44px;">
<canvas width="40" height="40"></canvas>
</div></td>
<td style="border-style: none;">
<button onclick="clear_value()">重绘</button>
</td>
</table>
"""

javascript = """
<script type="text/Javascript">
    var pixels = [];
    for (var i = 0; i < 8*8; i++) pixels[i] = 0;
    var click = 0;

    var canvas = document.querySelector("canvas");
    canvas.addEventListener("mousemove", function(e){
        if (e.buttons == 1) {
            click = 1;
            canvas.getContext("2d").fillStyle = "rgb(0,0,0)";
            canvas.getContext("2d").fillRect(e.offsetX, e.offsetY, 5, 5);
            x = Math.floor(e.offsetY * 0.2);
            y = Math.floor(e.offsetX * 0.2) + 1;
            for (var dy = 0; dy < 1; dy++){
                for (var dx = 0; dx < 1; dx++){
                    if ((x + dx < 8) && (y + dy < 8)){
                        pixels[(y+dy)+(x+dx)*8] = 1;
                    }
                }
            }
        } else {
            if (click == 1) set_value();
            click = 0;
        }
    });
    
    function set_value(){
        var result = ""
        for (var i = 0; i < 8*8; i++) result += pixels[i] + ","
        var kernel = IPython.notebook.kernel;
        kernel.execute("image = [" + result + "]");
        kernel.execute("f = open('digits.json', 'w')");
        kernel.execute("f.write('{\\"inputs\\":%s}' % image)");
        kernel.execute("f.close()");
    }
    
    function clear_value(){
        canvas.getContext("2d").fillStyle = "rgb(255,255,255)";
        canvas.getContext("2d").fillRect(0, 0, 40, 40);
        for (var i = 0; i < 8*8; i++) pixels[i] = 0;
    }
</script>
"""
randint = np.random.randint(0, 9)
print(f"请在下方图框中细心绘制手写字符 {randint}")
HTML(input_form + javascript)
请在下方图框中细心绘制手写字符 2

Since the input box is small, you can write with the mouse by magnifying the browser page. The drawn character will be automatically saved as the digits.json file in the current directory. Then, we read this file and draw the image.

import json
import numpy as np

with open("digits.json") as f:
    inputs = f.readlines()[0]
    inputs_array = np.array(json.loads(inputs)["inputs"])
plt.imshow(inputs_array.reshape((8, 8)), cmap=plt.cm.gray_r)
<matplotlib.image.AxesImage at 0x16307e320>
../_images/871c0104a0fd2f78737716fa98ec568baa49b6621472e15295e55482f3bfa115.png

You will find that due to the image resolution of the DIGITS dataset being \(8 \times 8\) pixels, the processed image will be slightly different from the drawn image. At the same time, since the image drawn above is a binary image, that is, the black pixel values are stored as 1 and the white pixels are stored as 0. Therefore, below we need to retrain the DIGITS model to make it match. We replace all values greater than 0 in digits.data with 1 and use all the data for training.

# 重新训练神经网络
digits.data[digits.data > 0] = 1
model = MLPClassifier(tol=0.001, max_iter=50, verbose=1)
model.fit(digits.data, digits.target)
Iteration 1, loss = 2.22309482
Iteration 2, loss = 1.98611114
Iteration 3, loss = 1.77759516
Iteration 4, loss = 1.57927044
Iteration 5, loss = 1.38663125
Iteration 6, loss = 1.20774216
Iteration 7, loss = 1.05083426
Iteration 8, loss = 0.92058288
Iteration 9, loss = 0.82050522
Iteration 10, loss = 0.73771197
Iteration 11, loss = 0.67522434
Iteration 12, loss = 0.61926095
Iteration 13, loss = 0.57464139
Iteration 14, loss = 0.53840080
Iteration 15, loss = 0.50641056
Iteration 16, loss = 0.47852272
Iteration 17, loss = 0.45534865
Iteration 18, loss = 0.43354940
Iteration 19, loss = 0.41585885
Iteration 20, loss = 0.39885888
Iteration 21, loss = 0.38530286
Iteration 22, loss = 0.37004597
Iteration 23, loss = 0.35894133
Iteration 24, loss = 0.34725008
Iteration 25, loss = 0.33895579
Iteration 26, loss = 0.32783111
Iteration 27, loss = 0.32029436
Iteration 28, loss = 0.31536069
Iteration 29, loss = 0.30647274
Iteration 30, loss = 0.29730023
Iteration 31, loss = 0.28933556
Iteration 32, loss = 0.28507046
Iteration 33, loss = 0.27861768
Iteration 34, loss = 0.27280789
Iteration 35, loss = 0.26719081
Iteration 36, loss = 0.26241597
Iteration 37, loss = 0.25833709
Iteration 38, loss = 0.25386779
Iteration 39, loss = 0.24923674
Iteration 40, loss = 0.24560314
Iteration 41, loss = 0.24193601
Iteration 42, loss = 0.23912347
Iteration 43, loss = 0.23715456
Iteration 44, loss = 0.23219586
Iteration 45, loss = 0.22928455
Iteration 46, loss = 0.22426162
Iteration 47, loss = 0.22252067
Iteration 48, loss = 0.21962222
Iteration 49, loss = 0.21669993
Iteration 50, loss = 0.21438963
MLPClassifier(max_iter=50, tol=0.001, verbose=1)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Next, you can use the just-trained model to predict the handwritten characters drawn by yourself. We perform incremental training on each prediction result to improve the model. If the prediction is correct, incremental training can incorporate this sample into the model. If the prediction is incorrect, incremental training can continuously improve the model.

inputs_array = np.atleast_2d(inputs_array)  # 将其处理成 2 维数组
result = model.predict(inputs_array)  # 预测

if result != randint:
    print(f"预测错误|预测标签: {result}|真实标签: {randint}")
    model.partial_fit(inputs_array, np.atleast_1d(randint))
    print("完成增量训练")
else:
    print(f"预测正确|预测标签: {result}|真实标签: {randint}")
    model.partial_fit(inputs_array, np.atleast_1d(randint))
    print("完成增量训练")
预测正确|预测标签: [2]|真实标签: 2
Iteration 51, loss = 0.19804450
完成增量训练

Since the neural network can output the probabilities of different label predictions, finally take a look at the basis for the network’s judgment on which category the input image belongs to.

# 输出神经网络对各类别的概率值
pred_proba = model.predict_proba(np.atleast_2d(inputs_array))

# 绘制柱形图
plt.xticks(range(10))
plt.bar(range(10), pred_proba[0], align="center")
<BarContainer object of 10 artists>
../_images/c077315d2de391de87d5d15a079c3d8161bedf057a43219b4643022e38de5e4f.png

The larger the value in the bar chart above, the higher the probability that the network believes the input image belongs to that category.

Specifically, you can try to repeatedly execute the above two cells multiple times, that is, repeatedly perform incremental training on custom handwritten characters. You should be able to see that incremental training makes the probability of the correct label higher and higher. This is the intuitive effect of optimizing the model through incremental training.

45.5. Summary#

In this experiment, we learned about the static training and dynamic training processes of machine learning models, and specifically studied dynamic incremental training. Incremental training has a wide range of applications in the field of machine learning engineering. Models deployed online need to be continuously improved to perform better and better.

In fact, you can implement an online real-time handwritten character recognition application by leveraging the ideas of the previous model deployment. And collect the results of each recognition to perform incremental training on the model. Of course, this requires you to have a good understanding of Web frameworks such as Flask. If you are interested, you can learn this example.

Related Links


○ Sharethis article link to your social media, blog, forum, etc. More external links will increase the search engine ranking of this site.