12. Gradient Descent Method Implementation and Application#
12.1. Introduction#
In the logistic regression experiment, we learned to use the gradient descent method to solve problems. As an optimization method, gradient descent can be widely used in the optimization process of parameter problems. To better understand the advantages of this method and its usage process, in this challenge, we will try to use gradient descent to solve the linear regression problem we learned before.
12.2. Key Points#
-
Solving linear regression parameters using the least squares method
-
Solving linear regression parameters using the gradient descent method
Do you still remember the “linear regression” we learned? We used ordinary least squares (OLS) to solve the fitting parameters of linear regression. Of course, according to different calculation methods, OLS can be further divided into algebraic solution and matrix solution.
First, let’s review the algebraic solution process of ordinary least squares for linear regression. For a univariate linear equation:
Define its squared loss function as:
Next, find the first-order partial derivatives of the squared loss function:
To find the values of (w_0) and (w_1), we set (\frac{\partial f}{\partial w_{0}} = 0) and (\frac{\partial f}{\partial w_{1}} = 0), and solve to get:
Supplementary formula 3 → 4 detailed solution process
We start from equations (3a) and (3b):
Setting the right side of equation (3a) equal to 0, we get:
Setting the right side of equation (3b) equal to 0, we get:
We now have two equations (5) and (6) containing two unknowns (w_0) and (w_1). Through algebraic operations, we can solve for the expressions of (w_0) and (w_1).
First, substitute the expression for (w_1) from equation (5) into equation (6):
Substitute into equation (5), we can get the expression for (w_1):
So far, we have solved the parameters of the linear equation algebraically. Next, we use Python code to implement the OLS algebraic solution function.
Exercise 12.1
Challenge: Complete the ordinary least squares algebraic calculation function according to formula (4).
def ols_algebra(x, y):
"""
参数:
x -- 自变量数组
y -- 因变量数组
返回:
w1 -- 线性方程系数
w0 -- 线性方程截距项
"""
### 代码开始 ### (≈ 3 行代码)
w1 = None
w0 = None
### 代码结束 ###
return w1, w0
Solution to Exercise 12.1
def ols_algebra(x, y):
"""
Parameters:
x -- Array of independent variables
y -- Array of dependent variables
Returns:
w1 -- Coefficient of the linear equation
w0 -- Intercept term of the linear equation
"""
### Start of code ### (≈ 3 lines of code)
n = len(x)
w1 = (n*sum(x*y) - sum(x)*sum(y))/(n*sum(x*x) - sum(x)*sum(x))
w0 = (sum(x*x)*sum(y) - sum(x)*sum(x*y))/(n*sum(x*x)-sum(x)*sum(x))
### End of code ###
return w1, w0
Next, we provide a set of test data.
import numpy as np
x = np.array([55, 71, 68, 87, 101, 87, 75, 78, 93, 73])
y = np.array([91, 101, 87, 109, 129, 98, 95, 101, 104, 93])
Run the tests
w1, w0 = ols_algebra(x, y)
round(w1, 3), round(w0, 3)
Expected output
(0.718, 44.256)
So far, we have obtained the final calculation result:
In addition to using ordinary least squares to solve the linear regression problem, it can actually be solved using iterative methods. Here, we can apply the gradient descent method learned in the logistic regression experiment to solve the linear regression problem.
We still use the above squared loss function. Then when using the iterative method, only one step needs to be modified. That is, instead of setting \(\frac{\partial f}{\partial w_{0}} = 0\) and \(\frac{\partial f}{\partial w_{1}} = 0\), we use the gradient descent method to solve for the parameters:
The following uses Python code to implement the solution process of the gradient descent method.
Exercise 12.2
Challenge: Complete the ordinary least squares algebraic calculation function according to formulas (3) and (5).
def ols_gradient_descent(x, y, lr, num_iter):
"""
参数:
x -- 自变量数组
y -- 因变量数组
lr -- 学习率
num_iter -- 迭代次数
返回:
w1 -- 线性方程系数
w0 -- 线性方程截距项
"""
### 代码开始 ### (> 5 行代码)
w1 = None
w0 = None
### 代码结束 ###
return w1, w0
Solution to Exercise 12.2
def ols_gradient_descent(x, y, lr, num_iter):
"""
Parameters:
x -- Array of independent variables
y -- Array of dependent variables
lr -- Learning rate
num_iter -- Number of iterations
Returns:
w1 -- Coefficient of the linear equation
w0 -- Intercept term of the linear equation
"""
### Code starts ### (> 5 lines of code)
w1 = 0 # Initial parameter is 0
w0 = 0 # Initial parameter is 0
for i in range(num_iter): # Gradient descent iteration
# Calculate the approximation
y_hat = (w1 * x) + w0
# Calculate the gradient corresponding to the parameter
w1_gradient = -2 * sum(x * (y - y_hat))
w0_gradient = -2 * sum(y - y_hat)
# Update the parameter according to the gradient
w1 -= lr * w1_gradient
w0 -= lr * w0_gradient
### Code ends ###
return w1, w0
Run tests
w1_, w0_ = ols_gradient_descent(x, y, lr=0.00001, num_iter=100)
round(w1_, 3), round(w0_, 3)
Expected output
(1.264, 0.038)
So far, we have obtained the final calculation result:
You will find that there are some differences between the iterative method and the exact solution calculated by OLS above. In fact, the main difference is in the intercept term. However, the intercept term has a relatively small impact on the linear equation. We can compare the fitting effects of the two methods by plotting.
from matplotlib import pyplot as plt
%matplotlib inline
fig, axes = plt.subplots(1, 2, figsize=(15, 5))
axes[0].scatter(x, y)
axes[0].plot(np.array([50, 110]), np.array([50, 110]) * w1 + w0, "r")
axes[0].set_title("OLS")
axes[1].scatter(x, y)
axes[1].plot(np.array([50, 110]), np.array([50, 110]) * w1_ + w0_, "r")
axes[1].set_title("Gradient descent")
It can be seen that the results shown in the image are still very close.
The reason why the ordinary least squares method is used to solve the linear regression problem is that we can easily find the minimum value of the loss function. However, in many problems in machine learning, we often face very complex loss functions, which generally cannot directly obtain the minimum value and can only use iterative methods to find local or global minima. This is why we learn iterative methods such as gradient descent.