计算滚动平均

发布时间 2023-12-28 15:48:58作者: Bonne_chance

计算滚动平均

滚动平均是指时间序列中之前特定数量数据的平均值。
pandas中有相应的库函数来实现计算任务。
具体语法:df['column_name'].rolling(rolling_window).mean()
实例:

import numpy as np
import pandas as pd

#make this example reproducible
np.random.seed(0)

#create dataset
period = np.arange(1, 101, 1) # 1-100的时间区间
leads = np.random.uniform(1, 20, 100)
sales = 60 + 2*period + np.random.normal(loc=0, scale=.5*period, size=100)
df = pd.DataFrame({'period': period, 'leads': leads, 'sales': sales})

#view first 10 rows
df.head(10)

结果:

 period	leads	sales
0	1	11.427457	61.417425
1	2	14.588598	64.900826
2	3	12.452504	66.698494
3	4	11.352780	64.927513
4	5	9.049441	73.720630
5	6	13.271988	77.687668
6	7	9.314157	78.125728
7	8	17.943687	75.280301
8	9	19.309592	73.181613
9	10	8.285389	85.272259

使用pandas语法来完成前5区间的滚动平均计算。

#find rolling mean of previous 5 sales periods
df['rolling_sales_5'] = df['sales'].rolling(5).mean()

#view first 10 rows
df.head(10)

结果显示:

 period	leads	sales	rolling_sales_5
0	1	11.427457	61.417425	NaN
1	2	14.588598	64.900826	NaN
2	3	12.452504	66.698494	NaN
3	4	11.352780	64.927513	NaN
4	5	9.049441	73.720630	66.332978
5	6	13.271988	77.687668	69.587026
6	7	9.314157	78.125728	72.232007
7	8	17.943687	75.280301	73.948368
8	9	19.309592	73.181613	75.599188
9	10	8.285389	85.272259	77.909514

使用相似的语法可以计算其他多列的滚动平均:

#find rolling mean of previous 5 leads periods 
df['rolling_leads_5'] = df['leads'].rolling(5).mean() 

#find rolling mean of previous 5 leads periods
df['rolling_sales_5'] = df['sales'].rolling(5).mean()

#view first 10 rows
df.head(10)

结果显示:

 period	leads	sales	rolling_sales_5	rolling_leads_5
0	1	11.427457	61.417425	NaN	NaN
1	2	14.588598	64.900826	NaN	NaN
2	3	12.452504	66.698494	NaN	NaN
3	4	11.352780	64.927513	NaN	NaN
4	5	9.049441	73.720630	66.332978	11.774156
5	6	13.271988	77.687668	69.587026	12.143062
6	7	9.314157	78.125728	72.232007	11.088174
7	8	17.943687	75.280301	73.948368	12.186411
8	9	19.309592	73.181613	75.599188	13.777773
9	10	8.285389	85.272259	77.909514	13.624963

将不同滚动窗口的滚动平均效果进行对比:

import matplotlib.pyplot as plt
plt.plot(df['rolling_sales_5'], label='Rolling Mean Window=5')
plt.plot(df['rolling_sales_10'], label='Rolling Mean Window=10')
plt.plot(df['sales'], label='Raw Data')
plt.legend()
plt.ylabel('Sales')
plt.xlabel('Period')
plt.show()

结果显示:

结论:
滚动平均是一种数据光滑化的操作,窗口越大,越光滑,越接近趋势线。