51. TensorFlow California Housing Price Prediction#
51.1. Introduction#
Previously, we have understood how TensorFlow works and some common components. Next, we will try to implement a linear regression using TensorFlow. You may think that linear regression is very basic, but the main purpose here is to get familiar with the entire process of building a model with TensorFlow and the important concepts involved.
51.2. Key Points#
Least Squares Linear Regression
Basic TensorFlow Operations
We attempt to predict the housing prices in California.
Here, we build a data flow graph of a linear regression
model using TensorFlow and calculate the corresponding
weights
weight
(denoted as
theta
in the code) using the least squares method.
First, the challenge requires loading the sample dataset.
Here, we use the dataset built into scikit-learn. Among
them,
housing.data
represents the feature data, and
housing.target
is the target data.
from sklearn.datasets import fetch_california_housing
housing = fetch_california_housing() # 加州房价数据集
housing.data.shape, housing.target.shape
((20640, 8), (20640,))
housing.data[0], housing.target[0] # 预览第一个样本特征及目标值
(array([ 8.3252 , 41. , 6.98412698, 1.02380952,
322. , 2.55555556, 37.88 , -122.23 ]),
4.526)
As can be seen, there are 20,640 pieces of data in total, including 8 features. The target value is the housing price, so it is a typical regression problem. Therefore, below we will use the matrix derivation of the least squares method to calculate the final fitting coefficients for this multiple linear regression problem.
First, we supplement the formula for the matrix derivation of linear regression:
Based on previous experience, we need to supplement a column of all 1s to the feature data, which serves as the intercept term coefficient when participating in the calculation. Additionally, we need to convert the target value into a corresponding two-dimensional array for convenience in subsequent calculations.
Exercise 51.1
Challenge: Perform the operation of supplementing 1 to the feature matrix and convert the target array into a two-dimensional NumPy array.
import numpy as np
## 代码开始 ### (≈2 行代码)
X = None
y = None
## 代码结束 ###
Solution to Exercise 51.1
import numpy as np
### Start of code ### (≈2 lines of code)
X = np.append(housing.data, np.ones((housing.data.shape[0], 1)), axis=1)
y = housing.target.reshape(-1, 1)
### End of code ###
Run the tests
X.shape, y.shape
Expected output
((20640,
9),
(20640,
1))
Next, we use the mathematical calculation methods provided by TensorFlow, create a session, and obtain the final fitting coefficients.
Exercise 51.2
Challenge: Use the methods provided by TensorFlow to complete the calculation of the fitting coefficients for linear regression.
Requirement: The features and target values need to be converted into tensors, and only the operation methods provided by TensorFlow 2 can be used to complete the calculation. It is recommended to use the search function to find the mathematical calculation methods you need in the official documentation.
import tensorflow as tf
## 代码开始 ### (≈6 行代码)
W = None
## 代码结束 ###
Solution to Exercise 51.2
import tensorflow as tf
## Code starts ### (≈6 lines of code)
X = tf.constant(X) # Define constants
y = tf.constant(y)
XT = tf.transpose(X) # Calculate the fitting coefficients according to the formula
W = tf.matmul(tf.matmul(tf.matrix_inverse(
tf.matmul(XT, X)), XT), y)
with tf.Session() as sess: # Create a session to start calculating the value of theta
W = W.eval()
### Code ends ###
Run the test
W
Expected output
tf.Tensor: id=1, shape=(9, 1), dtype=float64, numpy=
array([[ 4.36693293e-01],
[ 9.43577803e-03],
[-1.07322041e-01],
[ 6.45065694e-01],
[-3.97638942e-06],
[-3.78654265e-03],
[-4.21314378e-01],
[-4.34513755e-01],
[-3.69419202e+01]])
If you use the linear regression method provided by scikit-learn to build a model, you will find that the results are exactly the same.
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(housing.data, housing.target)
model.coef_, model.intercept_
Related links