35. Association Rule Analysis of Shopping Data#
35.1. Introduction#
This challenge will attempt to analyze the basket data using the method of association rule mining to find the frequent item sets and association rules among them.
35.2. Knowledge Points#
Dataset creation
Data preprocessing
Application of the Apriori algorithm
Generation of association rules
35.3. Challenge Introduction#
The challenge provides a supermarket shopping dataset containing 7,500 pieces of data, with each piece being the data of a single shopping cart. The data download address is:
wget -nc https://cdn.aibydoing.com/aibydoing/files/shopping_data.csv
35.4. Challenge Content#
Exercise 35.1
The challenge will use the Apriori algorithm to perform
association rule analysis on the dataset. Please find
the frequent itemsets that meet the minimum support
threshold of
0.05
and calculate the association rules with a minimum
confidence threshold of
0.2
.
Before the challenge starts, you need to open the terminal and execute the following steps to install the mlxtend machine learning algorithm library.
pip install mlxtend
35.5. Challenge Requirements#
-
The code needs to be saved in the
Code
folder and namedassociation.py
. -
You need to write the code inside
def rule()
, without modifying the function name. -
The challenge requires returning the DataFrames corresponding to the frequent itemsets and association rules in sequence.
-
During testing, you need to run
association.py
usingpython
to avoid the situation of missing corresponding modules.
35.6. Example Code#
def rule():
### 补充代码 ###
return frequent_itemsets, association_rules # 返回频繁项集和关联规则对应的 DataFrame
Special attention: The function name
rule()
must not be modified, and
no parameters can be added to
rule()
. Otherwise, the system will not be able to correctly judge
the results.
Solution to Exercise 35.1
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules as rules
def rule():
df = pd.read_csv("shopping_data.csv", header=None)
dataset = df.stack().groupby(level=0).apply(list).tolist()
te = TransactionEncoder() # Define the model
te_ary = te.fit_transform(dataset) # Transform the dataset
df = pd.DataFrame(te_ary, columns=te.columns_) # Process the array into a DataFrame
frequent_itemsets = apriori(df, min_support=0.05, use_colnames=True)
association_rules = rules(frequent_itemsets, metric="confidence", min_threshold=0.2) # The confidence threshold is 0.1
return frequent_itemsets, association_rules