Pro Membership

Pro Membership is a membership plan created by the author to maintain and update this tutorial. You can get more benefits and services, click to view details.

6. Bitcoin Price Prediction and Plotting#

6.1. Introduction#

Bitcoin (English: Bitcoin, abbreviation: BTC) is considered by some to be a decentralized, non-universally globally payable electronic cryptocurrency, while most countries consider Bitcoin to be a virtual commodity rather than currency. Bitcoin was invented and created by Satoshi Nakamoto (a pseudonym) on January 3, 2009, based on a borderless peer-to-peer network, using consensus-driven open-source software.

6.2. Key Points#

Data Preparation
3rd Degree Polynomial Regression Prediction Challenge
Nth Degree Polynomial Regression Prediction Plotting

Since the emergence of Bitcoin to date, it has always been the digital currency with the highest total market value in the current fiat currency market. For some time, the value of Bitcoin has been highly controversial. Some people think it is a serious “bubble”, while others think it is worth the price. But no matter which view, we have witnessed the sharp rise and fall of Bitcoin. In this challenge, historical data of Bitcoin from 2010 to 2018 was collected. It includes information such as transaction prices, block numbers, transaction fees, etc. We will try to use polynomial regression and ridge regression methods to predict the price change trend of Bitcoin.

6.3. Data Preparation#

First, it is necessary to import the Bitcoin historical dataset and preview the first 5 rows of the dataset. The name of the dataset is challenge-2-bitcoin.csv.

                      # Dataset link
https://cdn.aibydoing.com/aibydoing/files/challenge-2-bitcoin.csv

Exercise 6.1

Challenge: Use Pandas to load the CSV file of the dataset and preview the first 5 rows of data.

                          import pandas as pd

## 代码开始 ### (≈ 2 行代码)
df = None
## 代码结束 ###

Solution to Exercise 6.1

                          # Download the dataset
wget -nc https://cdn.aibydoing.com/aibydoing/files/challenge-2-bitcoin.csv

                          import pandas as pd

### Start of code ### (≈ 2 lines of code)
df = pd.read_csv('challenge-2-bitcoin.csv', header=0)
df.head()
### End of code ###

                        

Expected output

	Date	btc_total_bitcoins	btc_avg_block_size	btc_n_transactions_per_block	...	btc_cost_per_transaction_percent	btc_n_unique_addresses	btc_n_transactions	btc_n_transactions_total	btc_n_transactions_excluding_popular	btc_n_transactions_excluding_chains_longer_than_100	btc_output_volume	btc_estimated_transaction_volume
0	2010-02-23 00:00:00	2110700.0	0.000216	1.0	...	25100.000000	252.0	252.0	42613.0	252.0	252.0	12600.0	50.0
1	2010-02-24 00:00:00	2120200.0	0.000282	1.0	...	179.245283	195.0	196.0	42809.0	196.0	196.0	14800.0	5300.0
2	2010-02-25 00:00:00	2127600.0	0.000227	1.0	...	1057.142857	150.0	150.0	42959.0	150.0	150.0	8100.0	700.0
3	2010-02-26 00:00:00	2136100.0	0.000319	1.0	...	64.582059	176.0	176.0	43135.0	176.0	176.0	29349.0	13162.0
4	2010-02-27 00:00:00	2144750.0	0.000223	1.0	...	1922.222222	176.0	176.0	43311.0	176.0	176.0	9101.0	450.0

As can be seen, the original dataset contains a large amount of data. In this challenge, only 3 columns are used, namely: the Bitcoin market price, the total amount of bitcoins, and the Bitcoin transaction fees. Their corresponding column names are: btc_market_price, btc_total_bitcoins, btc_transaction_fees.

Exercise 6.2

Challenge: Isolate a DataFrame that only contains the columns btc_market_price, btc_total_bitcoins, and btc_transaction_fees, and define it as the variable data.

                          ## 代码开始 ### (≈ 1 行代码)
data = None
## 代码结束 ###

                        

Solution to Exercise 6.2

                          ### Start of code ### (≈ 1 line of code)
data = df[['btc_market_price','btc_total_bitcoins', 'btc_transaction_fees']]
### End of code ###

                        

Run the tests

                          data.head()

                        

Expected output

	btc_market_price	btc_total_bitcoins	btc_transaction_fees
0	0.0	2110700.0	0.0
1	0.0	2120200.0	0.0
2	0.0	2127600.0	0.0
3	0.0	2136100.0	0.0
4	0.0	2144750.0	0.0

Next, we will plot the three columns of data on three subplots arranged horizontally.

Exercise 6.3

Challenge: Plot line charts for the three columns of data in the data dataset respectively, and arrange them as horizontal subplots.

Requirement: Set the names of the horizontal and vertical axes for each chart. The horizontal axis should be uniformly set to time, and the vertical axis should be the name of each column.

Hint: Use set_xlabel() to set the name of the horizontal axis.

                          from matplotlib import pyplot as plt
%matplotlib inline

fig, axes = plt.subplots(1, 3, figsize=(16, 5))

## 代码开始 ### (≈ 9 行代码)

## 代码结束 ###

Solution to Exercise 6.3

                          from matplotlib import pyplot as plt
%matplotlib inline

fig, axes = plt.subplots(1, 3, figsize=(16, 5))

### Code starts ### (≈ 9 lines of code)
axes[0].plot(data['btc_market_price'], 'green')
axes[0].set_xlabel('time')
axes[0].set_ylabel('btc_market_price')

axes[1].plot(data['btc_total_bitcoins'], 'blue')
axes[1].set_xlabel('time')
axes[1].set_ylabel('btc_total_bitcoins')

axes[2].plot(data['btc_transaction_fees'], 'brown')
axes[2].set_xlabel('time')
axes[2].set_ylabel('btc_transaction_fees')
### Code ends ###

                        

Expected output (color can be ignored)

https://cdn.aibydoing.com/aibydoing/images/document-uid214893labid6102timestamp1531374478620.png

In this challenge, the features of the dataset are “total number of bitcoins” and “bitcoin transaction fees”, while the target value is “bitcoin market price”. Therefore, the dataset will be split into a training set and a test set below. Among them, the training set accounts for 70%, and the test set accounts for 30%.

Exercise 6.4

Challenge: Split the data dataset so that the training set accounts for 70% and the test set accounts for 30%.

Requirement: The training set features, training set target, test set features, and test set target are defined as X_train, y_train, X_test, and y_test respectively, and are returned as the return value of the split_dataset() function.

                          def split_dataset():
    """
    参数:
    无

    返回:
    X_train, y_train, X_test, y_test -- 训练集特征、训练集目标、测试集特征、测试集目标
    """

    ### 代码开始 ### (≈ 6 行代码)

    ### 代码结束 ###

    return X_train, y_train, X_test, y_test

                        

Solution to Exercise 6.4

                          def split_dataset():
    """
    Parameters:
    None

    Returns:
    X_train, y_train, X_test, y_test -- Training set features, training set target, test set features, test set target
    """
    
    ### START CODE HERE ### (≈ 6 lines of code)
    train_data = data[:int(len(data)*0.7)]
    test_data = data[int(len(data)*0.7):]
    
    X_train = train_data[['btc_total_bitcoins', 'btc_transaction_fees']]
    y_train = train_data[['btc_market_price']]
    
    X_test = test_data[['btc_total_bitcoins', 'btc_transaction_fees']]
    y_test = test_data[['btc_market_price']]
    ### END CODE HERE ###
    
    return X_train, y_train, X_test, y_test

                        

Run the tests

                          len(split_dataset()[0]), len(split_dataset()[1]), len(split_dataset()[2]), len(split_dataset()[
    3]), split_dataset()[0].shape, split_dataset()[1].shape, split_dataset()[2].shape, split_dataset()[3].shape

Expected output

(2043, 2043, 877, 877, (2043, 2), (2043, 1), (877, 2), (877, 1))

6.4. 3rd Degree Polynomial Regression Prediction Challenge#

After splitting the training data and test data, a polynomial regression prediction model can be constructed. The challenge requires using scikit-learn to complete it.

                          # 加载必要模块
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error

# 加载数据
X_train = split_dataset()[0]
y_train = split_dataset()[1]
X_test = split_dataset()[2]
y_test = split_dataset()[3]

                        

Exercise 6.5

Challenge: Build a 3rd degree polynomial regression prediction model.

Requirement: Use scikit-learn to build a 3rd degree polynomial regression prediction model, calculate the MAE evaluation metric of the prediction results, and return it as the value of the poly3() function.

                          def poly3():
    """
    参数:
    无

    返回:
    mae -- 预测结果的 MAE 评价指标
    """

    ### 代码开始 ### (≈ 7 行代码)

    ### 代码结束 ###

    return mae

                        

Solution to Exercise 6.5

                          def poly3():
    
    """
    Parameters:
    None

    Returns:
    mae -- The MAE evaluation metric of the prediction results
    """
    
    ### START CODE HERE ### (≈ 7 lines of code)
    poly_features = PolynomialFeatures(degree=3, include_bias=False)
    poly_X_train = poly_features.fit_transform(X_train)
    poly_X_test = poly_features.transform(X_test)

    model = LinearRegression()
    model.fit(poly_X_train, y_train)
    pre_y = model.predict(poly_X_test)
    
    mae = mean_absolute_error(y_test, pre_y.flatten())
    ### END CODE HERE ###
    
    return mae

                        

Run the tests

poly3()

Expected output

1955.8027790596564

6.5. Nth Degree Polynomial Regression Prediction Plot#

Next, calculate the corresponding MSE evaluation metric values and plot them for different polynomial degrees.

Exercise 6.6

Challenge: Calculate the MSE evaluation metric for the prediction results of polynomial regression of degrees 1, 2,…, 10.

Requirement: Use scikit-learn to build an Nth-degree polynomial regression prediction model, calculate the MSE evaluation metric for the prediction results of polynomials of degrees 1 - 10, and return it as the value of the function poly_plot(N).

                          from sklearn.pipeline import make_pipeline
from sklearn.metrics import mean_squared_error


def poly_plot(N):
    """
    参数:
    N -- 标量, 多项式次数

    返回:
    mse -- N 次多项式预测结果的 MSE 评价指标列表
    """

    m = 1
    mse = []

    ### 代码开始 ### (≈ 6 行代码)

    ### 代码结束 ###

    return mse

                        

Solution to Exercise 6.6

                          from sklearn.pipeline import make_pipeline
from sklearn.metrics import mean_squared_error

def poly_plot(N):
 
    """
    Parameters:
    N -- scalar, degree of the polynomial

    Returns:
    mse -- list of MSE evaluation metrics for the prediction results of the Nth-degree polynomial
    """
    
    m = 1
    mse = []
    
    ### START CODE HERE ### (≈ 6 lines of code)
    while m <= N:
        model = make_pipeline(PolynomialFeatures(m, include_bias=False), LinearRegression())
        model.fit(X_train, y_train)
        pre_y = model.predict(X_test)
        mse.append(mean_squared_error(y_test, pre_y.flatten()))
        m = m + 1
    
    ### END CODE HERE ###
    
    return mse

                        

Run the tests

                          poly_plot(10)[:10:3]

                        

Expected output (the results may vary slightly)

[24171680.63629423, 23772159.453013, 919854753.0234015, 3708858661.222856]

Exercise 6.7

Challenge: Plot the MSE evaluation metric as a line plot

Requirement: Plot the list of MSE returned by poly_plot(10) as a combined plot (line plot + scatter plot). Among them, the line plot is in red.

                          mse = poly_plot(10)

## 代码开始 ### (≈ 2 行代码)

## 代码结束 ###

plt.title("MSE")
plt.xlabel("N")
plt.ylabel("MSE")

Solution to Exercise 6.7

                          mse = poly_plot(10)

### Start of code ### (≈ 2 lines of code)
plt.plot([i for i in range(1, 11)], mse, 'r')
plt.scatter([i for i in range(1, 11)], mse)
### End of code ###

### Solution two ###
plt.plot(mse, marker='-o')

plt.title("MSE")
plt.xlabel("N")
plt.ylabel("MSE")

                        

Expected output

○ Sharethis article link to your social media, blog, forum, etc. More external links will increase the search engine ranking of this site.

If you find this content helpful, you can buy me a coffee

5. Polynomial Regression Implementation and Application

7. Ridge Regression and LASSO Regression Implementation