cover

51. TensorFlow California Housing Price Prediction#

51.1. Introduction#

Previously, we have understood how TensorFlow works and some common components. Next, we will try to implement a linear regression using TensorFlow. You may think that linear regression is very basic, but the main purpose here is to get familiar with the entire process of building a model with TensorFlow and the important concepts involved.

51.2. Key Points#

  • Least Squares Linear Regression

  • Basic TensorFlow Operations

We attempt to predict the housing prices in California. Here, we build a data flow graph of a linear regression model using TensorFlow and calculate the corresponding weights weight (denoted as theta in the code) using the least squares method.

First, the challenge requires loading the sample dataset. Here, we use the dataset built into scikit-learn. Among them, housing.data represents the feature data, and housing.target is the target data.

from sklearn.datasets import fetch_california_housing

housing = fetch_california_housing()  # 加州房价数据集

housing.data.shape, housing.target.shape
((20640, 8), (20640,))
housing.data[0], housing.target[0]  # 预览第一个样本特征及目标值
(array([   8.3252    ,   41.        ,    6.98412698,    1.02380952,
         322.        ,    2.55555556,   37.88      , -122.23      ]),
 4.526)

As can be seen, there are 20,640 pieces of data in total, including 8 features. The target value is the housing price, so it is a typical regression problem. Therefore, below we will use the matrix derivation of the least squares method to calculate the final fitting coefficients for this multiple linear regression problem.

First, we supplement the formula for the matrix derivation of linear regression:

\[ y = WX \]
\[ W=(X^TX)^{-1}X^Ty \]

Based on previous experience, we need to supplement a column of all 1s to the feature data, which serves as the intercept term coefficient when participating in the calculation. Additionally, we need to convert the target value into a corresponding two-dimensional array for convenience in subsequent calculations.

\[\begin{split} X = \begin{bmatrix} x_{11}&\cdots&x_{18}\\ x_{21}&\cdots&x_{28}\\ \cdots&\cdots&\cdots\\ x_{n1}&\cdots&x_{n8}\\ \end{bmatrix} \rightarrow \begin{bmatrix} x_{11}&\cdots&x_{18}&1\\ x_{21}&\cdots&x_{28}&1\\ \cdots&\cdots&\cdots&1\\ x_{n1}&\cdots&x_{n8}&1\\ \end{bmatrix} \end{split}\]
\[ y = \left [ y_1, y_2, \cdots, y_n \right ] \rightarrow \left [ \left [y_1 \right ], \left [y_2 \right ], \left [\cdots \right ], \left [y_n \right ] \right ] \]

Exercise 51.1

Challenge: Perform the operation of supplementing 1 to the feature matrix and convert the target array into a two-dimensional NumPy array.

import numpy as np

## 代码开始 ### (≈2 行代码)
X = None
y = None
## 代码结束 ###

Run the tests

X.shape, y.shape

Expected output

((20640, 9), (20640, 1))

Next, we use the mathematical calculation methods provided by TensorFlow, create a session, and obtain the final fitting coefficients.

Exercise 51.2

Challenge: Use the methods provided by TensorFlow to complete the calculation of the fitting coefficients for linear regression.

Requirement: The features and target values need to be converted into tensors, and only the operation methods provided by TensorFlow 2 can be used to complete the calculation. It is recommended to use the search function to find the mathematical calculation methods you need in the official documentation.

import tensorflow as tf

## 代码开始 ### (≈6 行代码)
W = None
## 代码结束 ###

Run the test

W

Expected output

tf.Tensor: id=1, shape=(9, 1), dtype=float64, numpy=
array([[ 4.36693293e-01],
       [ 9.43577803e-03],
       [-1.07322041e-01],
       [ 6.45065694e-01],
       [-3.97638942e-06],
       [-3.78654265e-03],
       [-4.21314378e-01],
       [-4.34513755e-01],
       [-3.69419202e+01]])

If you use the linear regression method provided by scikit-learn to build a model, you will find that the results are exactly the same.

from sklearn.linear_model import LinearRegression

model = LinearRegression()
model.fit(housing.data, housing.target)
model.coef_, model.intercept_

Related links


○ Sharethis article link to your social media, blog, forum, etc. More external links will increase the search engine ranking of this site.