39. Agricultural Production Index Modeling Analysis#
39.1. Introduction#
This challenge will attempt to use machine learning models to conduct a modeling analysis of the agricultural production index and predict the future agricultural production index through historical data.
39.2. Key Points#
Data preprocessing
Data resampling
Usage of Prophet
39.3. Challenge Introduction#
In this challenge, we will become familiar with the ARIMA modeling process and methods. And according to the requirements of the experiment, obtain reasonable parameters for the model.
39.4. Challenge Content#
The challenge provides data on China’s agricultural
production index from 1952 to 1988, which is aggregated into
the data file
agriculture.csv. Download link:
wget -nc https://cdn.aibydoing.com/aibydoing/files/agriculture.csv
The dataset consists of two columns. A preview of the first 5 rows is as follows:
| year | values | |
|---|---|---|
| 0 | 1952 | 100.0 |
| 1 | 1953 | 101.6 |
| 2 | 1954 | 103.3 |
| 3 | 1955 | 111.5 |
| 4 | 1956 | 116.5 |
Exercise 39.1
The challenge requires performing time series ARIMA
modeling on this data file and returning three
reasonable parameters for
ARIMA(p,
d,
q).
Before starting the challenge, you need to open the terminal and execute the following steps to install the statsmodels library.
pip install statsmodels
39.5. Challenge Requirements#
-
The code needs to be saved in the
Codefolder and namedproduction_index.py. -
The code in the following
def arima()needs to be completed. At the end of the challenge, thep,d,qparameters of the ARIMA model need to be returned. -
When modeling, do not divide the test data and use all the data, and use the AIC solution method.
-
When testing, please run
production_index.pyusingpythonto avoid the situation of missing corresponding modules.
39.6. Example Code#
def arima():
### 补充代码 ###
return p, d, q
{solution-start} chapter04_06_1
:class: dropdown
import pandas as pd
from statsmodels.tsa.stattools import arma_order_select_ic
def arima():
df = pd.read_csv("agriculture.csv", index_col=0)
diff = df.diff().dropna()
p, q = arma_order_select_ic(diff, ic='aic')['aic_min_order'] # AIC
d = 1
return p, d, q
{solution-end}