Calculate weighted sums in rolling window with pandas data frames of varying length
I have a large dataframe> 5,000,000 rows that I am calculating.
df = pd.DataFrame(np.randn(10000,1), columns = ['rand'])
sum_abs = df.rolling(5).sum()
I would like to do the same calculations, but add a weighted sum.
df2 = pd.DataFrame(pd.Series([1,2,3,4,5]), name ='weight'))
df3 = df.mul(df2.set_index(df.index)).rolling(5).sum()
However, I get the expected mismatch length, has 5 elements. I know I could do something like [a *b for a, b in zip(L, weight)]
if I converted everything to a list, but I would like to store it in a dataframe if possible. Is there a way to multiply by different frame sizes or do I need to repeat a set of numbers whose length is multiplied by a factor?
+3
source to share
1 answer
An easy way to do it
w = np.arange(1, 6) df.rolling(5).apply(lambda x: (x * w).sum())
Less easy way with strides
from numpy.lib.stride_tricks import as_strided as strided v = df.values n, m = v.shape s1, s2 = v.strides k = 5 w = np.arange(1, 6).reshape(1, 1, k) pd.DataFrame( (strided(v, (n - k + 1, m, k), (s1, s2, s1)) * w).sum(-1), df.index[k - 1:], df.columns)
naive time test
+5
source to share