39. Agricultural Production Index Modeling Analysis#
39.1. Introduction#
This challenge will attempt to use machine learning models to conduct a modeling analysis of the agricultural production index and predict the future agricultural production index through historical data.
39.2. Key Points#
Data preprocessing
Data resampling
Usage of Prophet
39.3. Challenge Introduction#
In this challenge, we will become familiar with the ARIMA modeling process and methods. And according to the requirements of the experiment, obtain reasonable parameters for the model.
39.4. Challenge Content#
The challenge provides data on China’s agricultural
production index from 1952 to 1988, which is aggregated into
the data file
agriculture.csv
. Download link:
wget -nc https://cdn.aibydoing.com/aibydoing/files/agriculture.csv
The dataset consists of two columns. A preview of the first 5 rows is as follows:
year | values | |
---|---|---|
0 | 1952 | 100.0 |
1 | 1953 | 101.6 |
2 | 1954 | 103.3 |
3 | 1955 | 111.5 |
4 | 1956 | 116.5 |
Exercise 39.1
The challenge requires performing time series ARIMA
modeling on this data file and returning three
reasonable parameters for
ARIMA(p,
d,
q)
.
Before starting the challenge, you need to open the terminal and execute the following steps to install the statsmodels library.
pip install statsmodels
39.5. Challenge Requirements#
-
The code needs to be saved in the
Code
folder and namedproduction_index.py
. -
The code in the following
def arima()
needs to be completed. At the end of the challenge, thep
,d
,q
parameters of the ARIMA model need to be returned. -
When modeling, do not divide the test data and use all the data, and use the AIC solution method.
-
When testing, please run
production_index.py
usingpython
to avoid the situation of missing corresponding modules.
39.6. Example Code#
def arima():
### 补充代码 ###
return p, d, q
{solution-start} chapter04_06_1
:class: dropdown
import pandas as pd
from statsmodels.tsa.stattools import arma_order_select_ic
def arima():
df = pd.read_csv("agriculture.csv", index_col=0)
diff = df.diff().dropna()
p, q = arma_order_select_ic(diff, ic='aic')['aic_min_order'] # AIC
d = 1
return p, d, q
{solution-end}