Pandas calculate roll_std from top N rows of data
I have a dataframe like this:
date A
2015.1.1 10
2015.1.2 20
2015.1.3 30
2015.1.4 40
2015.1.5 50
2015.1.6 60
I need to capitalize std from the top N lines, for example:
date A std
2015.1.1 10 std(10)
2015.1.2 20 std(10,20)
2015.1.3 30 std(10,20,30)
2015.1.4 40 std(10,20,30,40)
2015.1.5 50 std(10,20,30,40,50)
2015.1.6 60 std(10,20,30,40,50,60)
pd.rolling_std is used to do this, however, how do I change N dynamically?
df[['A']].apply(lambda x:pd.rolling_std(x,N))
<class 'pandas.core.frame.DataFrame'>
Index: 75 entries, 2015-04-16 to 2015-07-31
Data columns (total 4 columns):
A 75 non-null float64
dtypes: float64(4)
memory usage: 2.9+ KB
source to share
This can be done by calling apply
df like this:
In [29]:
def func(x):
return df.iloc[:x.name + 1][x.index].std()
β
df['std'] = df[['A']].apply(func, axis=1)
df
Out[29]:
date A std
0 2015.1.1 10 NaN
1 2015.1.2 20 7.071068
2 2015.1.3 30 10.000000
3 2015.1.4 40 12.909944
4 2015.1.5 50 15.811388
5 2015.1.6 60 18.708287
This uses double indices [[]]
to call apply
on the df with one column, this allows you to pass param axis=1
so you can call the function row-wise, then you have access to the index attribute, which is name
and the column name attribute is equal index
, this allows you to slice your df to calculate the sliding std
.
You can add the arg window to func
to change the window as desired
EDIT
It looks like your index is str, the following should work:
In [39]:
def func(x):
return df.ix[:x.name ][x.index].std()
β
df['std'] = df[['A']].apply(lambda x: func(x), axis=1)
df
Out[39]:
A std
date
2015.1.1 10 NaN
2015.1.2 20 7.071068
2015.1.3 30 10.000000
2015.1.4 40 12.909944
2015.1.5 50 15.811388
2015.1.6 60 18.708287
source to share