Pro Membership

Pro Membership is a membership plan created by the author to maintain and update this tutorial. You can get more benefits and services, click to view details.

12. Gradient Descent Method Implementation and Application#

12.1. Introduction#

In the logistic regression experiment, we learned to use the gradient descent method to solve problems. As an optimization method, gradient descent can be widely used in the optimization process of parameter problems. To better understand the advantages of this method and its usage process, in this challenge, we will try to use gradient descent to solve the linear regression problem we learned before.

12.2. Key Points#

Solving linear regression parameters using the least squares method
Solving linear regression parameters using the gradient descent method

Do you still remember the “linear regression” we learned? We used ordinary least squares (OLS) to solve the fitting parameters of linear regression. Of course, according to different calculation methods, OLS can be further divided into algebraic solution and matrix solution.

First, let’s review the algebraic solution process of ordinary least squares for linear regression. For a univariate linear equation:

\[ y ( x, w ) = w _ { 0 } + w _ { 1 } x \tag{1} \]

Define its squared loss function as:

\[ f = \sum _ { i = 1 } ^ { n } \left( y _ { i } - \left( w _ { 0 } + w _ { 1 } x _ { i } \right) \right) ^ { 2 } \tag{2} \]

Next, find the first-order partial derivatives of the squared loss function:

\[ \frac { \partial f } { \partial w _ { 0 } } = - 2 \sum _ { i = 1 } ^ { n } \left( y _ { i } - \left( w _ { 0 } + w _ { 1 } x _ { i } \right) \right) \tag{3a} \]

\[ \frac { \partial f } { \partial w _ { 1 } } = -2 \sum _ { i = 1 } ^ { n } x _ { i } \left( y _ { i } - \left( w _ { 0 } + w _ { 1 } x _ { i } \right) \right) \tag{3b} \]

To find the values of (w_0) and (w_1), we set (\frac{\partial f}{\partial w_{0}} = 0) and (\frac{\partial f}{\partial w_{1}} = 0), and solve to get:

\[\begin{align*} w_0 &= \frac{\sum_{i=1}^n x_i^2 \sum_{i=1}^n y_i - \sum_{i=1}^n x_i \sum_{i=1}^n x_iy_i}{n\sum_{i=1}^n x_i^2 - \left(\sum_{i=1}^n x_i\right)^2} \tag{4a} \\ w_1 &= \frac{n\sum_{i=1}^n x_iy_i - \sum_{i=1}^n x_i \sum_{i=1}^n y_i}{n\sum_{i=1}^n x_i^2 - \left(\sum_{i=1}^n x_i\right)^2} \tag{4b} \end{align*}\]

Supplementary formula 3 → 4 detailed solution process

We start from equations (3a) and (3b):

\[\begin{split} \begin{align} \frac{\partial f}{\partial w_0} &= -2\sum_{i=1}^{n}(y_i - (w_0 + w_1x_i)) = 0 \tag{3a}\\ \frac{\partial f}{\partial w_1} &= -2\sum_{i=1}^{n}x_i(y_i - (w_0 + w_1x_i)) = 0 \tag{3b} \end{align} \end{split}\]

Setting the right side of equation (3a) equal to 0, we get:

\[ \sum_{i=1}^{n}y_i - nw_0 - w_1\sum_{i=1}^{n}x_i = 0 \tag{5} \]

Setting the right side of equation (3b) equal to 0, we get:

\[ \sum_{i=1}^{n}x_iy_i - w_0\sum_{i=1}^{n}x_i^2 - w_1\sum_{i=1}^{n}x_i^2 = 0 \tag{6} \]

We now have two equations (5) and (6) containing two unknowns (w_0) and (w_1). Through algebraic operations, we can solve for the expressions of (w_0) and (w_1).

First, substitute the expression for (w_1) from equation (5) into equation (6):

\[\begin{split} \begin{align} \sum_{i=1}^{n}x_iy_i - w_0\sum_{i=1}^{n}x_i^2 - \left(\frac{1}{n}\sum_{i=1}^{n}y_i - \frac{w_0}{n}\sum_{i=1}^{n}x_i\right)\sum_{i=1}^{n}x_i^2 &= 0\\ \sum_{i=1}^{n}x_iy_i - w_0\sum_{i=1}^{n}x_i^2 - \frac{1}{n}\sum_{i=1}^{n}x_i^2\sum_{i=1}^{n}y_i + \frac{w_0}{n}\left(\sum_{i=1}^{n}x_i\right)^2 &= 0\\ \therefore w_0 &= \frac{\sum_{i=1}^{n}x_i^2\sum_{i=1}^{n}y_i - \sum_{i=1}^{n}x_i\sum_{i=1}^{n}x_iy_i}{n\sum_{i=1}^{n}x_i^2 - \left(\sum_{i=1}^{n}x_i\right)^2} \tag{4a} \end{align} \end{split}\]

Substitute into equation (5), we can get the expression for (w_1):

\[\begin{split} \begin{align} \sum_{i=1}^{n}y_i - n\left(\frac{\sum_{i=1}^{n}x_i^2\sum_{i=1}^{n}y_i - \sum_{i=1}^{n}x_i\sum_{i=1}^{n}x_iy_i}{n\sum_{i=1}^{n}x_i^2 - \left(\sum_{i=1}^{n}x_i\right)^2}\right) - w_1\sum_{i=1}^{n}x_i &= 0\\ \therefore w_1 &= \frac{n\sum_{i=1}^{n}x_iy_i - \sum_{i=1}^{n}x_i\sum_{i=1}^{n}y_i}{n\sum_{i=1}^{n}x_i^2 - \left(\sum_{i=1}^{n}x_i\right)^2} \tag{4b} \end{align} \end{split}\]

So far, we have solved the parameters of the linear equation algebraically. Next, we use Python code to implement the OLS algebraic solution function.

Exercise 12.1

Challenge: Complete the ordinary least squares algebraic calculation function according to formula (4).

                          def ols_algebra(x, y):
    """
    参数:
    x -- 自变量数组
    y -- 因变量数组

    返回:
    w1 -- 线性方程系数
    w0 -- 线性方程截距项
    """

    ### 代码开始 ### (≈ 3 行代码)
    w1 = None
    w0 = None
    ### 代码结束 ###

    return w1, w0

                        

Solution to Exercise 12.1

                          def ols_algebra(x, y):
    """
    Parameters:
    x -- Array of independent variables
    y -- Array of dependent variables

    Returns:
    w1 -- Coefficient of the linear equation
    w0 -- Intercept term of the linear equation
    """
    
    ### Start of code ### (≈ 3 lines of code)
    n = len(x)
    w1 = (n*sum(x*y) - sum(x)*sum(y))/(n*sum(x*x) - sum(x)*sum(x))
    w0 = (sum(x*x)*sum(y) - sum(x)*sum(x*y))/(n*sum(x*x)-sum(x)*sum(x))
    ### End of code ###
    
    return w1, w0

                        

Next, we provide a set of test data.

                          import numpy as np

x = np.array([55, 71, 68, 87, 101, 87, 75, 78, 93, 73])
y = np.array([91, 101, 87, 109, 129, 98, 95, 101, 104, 93])

Run the tests

                          w1, w0 = ols_algebra(x, y)
round(w1, 3), round(w0, 3)

Expected output

(0.718, 44.256)

So far, we have obtained the final calculation result:

\[ y = 0.718 * x + 44.256 \]

In addition to using ordinary least squares to solve the linear regression problem, it can actually be solved using iterative methods. Here, we can apply the gradient descent method learned in the logistic regression experiment to solve the linear regression problem.

We still use the above squared loss function. Then when using the iterative method, only one step needs to be modified. That is, instead of setting \(\frac{\partial f}{\partial w_{0}} = 0\) and \(\frac{\partial f}{\partial w_{1}} = 0\), we use the gradient descent method to solve for the parameters:

\[ w _ { 0 } = w _ { 0 } - lr * \frac { \partial f } { \partial w _ { 0 } } \tag{5a} \]

\[ w _ { 1 } = w _ { 1 } - lr * \frac { \partial f } { \partial w _ { 1 } } \tag{5b} \]

The following uses Python code to implement the solution process of the gradient descent method.

Exercise 12.2

Challenge: Complete the ordinary least squares algebraic calculation function according to formulas (3) and (5).

                          def ols_gradient_descent(x, y, lr, num_iter):
    """
    参数:
    x -- 自变量数组
    y -- 因变量数组
    lr -- 学习率
    num_iter -- 迭代次数

    返回:
    w1 -- 线性方程系数
    w0 -- 线性方程截距项
    """

    ### 代码开始 ### (> 5 行代码)
    w1 = None
    w0 = None
    ### 代码结束 ###

    return w1, w0

                        

Solution to Exercise 12.2

                          def ols_gradient_descent(x, y, lr, num_iter):
    """
    Parameters:
    x -- Array of independent variables
    y -- Array of dependent variables
    lr -- Learning rate
    num_iter -- Number of iterations

    Returns:
    w1 -- Coefficient of the linear equation
    w0 -- Intercept term of the linear equation
    """

    ### Code starts ### (> 5 lines of code)

    w1 = 0  # Initial parameter is 0
    w0 = 0  # Initial parameter is 0

    for i in range(num_iter):  # Gradient descent iteration
        # Calculate the approximation
        y_hat = (w1 * x) + w0
        # Calculate the gradient corresponding to the parameter
        w1_gradient = -2 * sum(x * (y - y_hat))
        w0_gradient = -2 * sum(y - y_hat)
        # Update the parameter according to the gradient
        w1 -= lr * w1_gradient
        w0 -= lr * w0_gradient

    ### Code ends ###

    return w1, w0

                        

Run tests

                          w1_, w0_ = ols_gradient_descent(x, y, lr=0.00001, num_iter=100)
round(w1_, 3), round(w0_, 3)

Expected output

(1.264, 0.038)

So far, we have obtained the final calculation result:

\[ y = 1.264 * x + 0.038 \]

You will find that there are some differences between the iterative method and the exact solution calculated by OLS above. In fact, the main difference is in the intercept term. However, the intercept term has a relatively small impact on the linear equation. We can compare the fitting effects of the two methods by plotting.

                          from matplotlib import pyplot as plt

%matplotlib inline

fig, axes = plt.subplots(1, 2, figsize=(15, 5))

axes[0].scatter(x, y)
axes[0].plot(np.array([50, 110]), np.array([50, 110]) * w1 + w0, "r")
axes[0].set_title("OLS")
axes[1].scatter(x, y)
axes[1].plot(np.array([50, 110]), np.array([50, 110]) * w1_ + w0_, "r")
axes[1].set_title("Gradient descent")

                        

It can be seen that the results shown in the image are still very close.

The reason why the ordinary least squares method is used to solve the linear regression problem is that we can easily find the minimum value of the loss function. However, in many problems in machine learning, we often face very complex loss functions, which generally cannot directly obtain the minimum value and can only use iterative methods to find local or global minima. This is why we learn iterative methods such as gradient descent.

○ Sharethis article link to your social media, blog, forum, etc. More external links will increase the search engine ranking of this site.

If you find this content helpful, you can buy me a coffee

11. Logistic Regression Implementation and Application

13. K-Nearest Neighbors Algorithm Implementation and Application