Pandas extension rolls out to beta regression
Hey. I am trying to calculate regression rates for an expanding window in pandas. I have the following function to calculate beta
def beta(row, col1, col2):
return numpy.cov(row[col1],row[col2]) / numpy.var(row[col1])
And I tried the following to get an expanding beta on my dataframe df
pandas.expanding_apply(df, beta, col1='col1', col2='col2')
pandas.expanding_apply(df, beta, kwargs={'col1':'col1', 'col2':'col2'})
df.expanding.apply(...)
However, none of them work, I either get something that says no kwargs are being passed, or if I hardcode the column names in the function beta
I get
*** IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
thank
Example:
def beta(row, col1, col2):
return numpy.cov(row[col1],row[col2]) / numpy.var(row[col1])
df = pandas.DataFrame({'a':[1,2,3,4,5],'b':[.1,5,.3,.5,6]})
pandas.expanding_apply(compute_df, beta, col1='a', col2='b')
pandas.expanding_apply(compute_df, beta, kwargs={'col1':'a', 'col2':'b'})
Both return errors
source to share
I ran into this issue while trying to compute beta for rolling multiple regression, very similar to what you are doing (see here ). The main problem is that when Expanding.apply(func, args=(), kwargs={})
the func
param
It is necessary to create one value from the input ndarray * args and ** kwargs are passed to the function
[ source ]
And there is no way to post with expanding.apply
. (Note: expanding_apply
deprecated as mentioned .)
Below is a workaround. It's a more expensive computing device (eats up memory) but will lead you to the exit. It creates a list of NumPy spreaders and then calculates the beta for each one.
from pandas_datareader.data import DataReader as dr
import numpy as np
import pandas as pd
df = (dr(['GOOG', 'SPY'], 'google')['Close']
.pct_change()
.dropna())
# i is the asset, m is market/index
# [0, 1] grabs cov_i,j from the covar. matrix
def beta(i, m):
return np.cov(i, m)[0, 1] / np.var(m)
def expwins(x, min_periods):
return [x[:i] for i in range(min_periods, x.shape[0] + 1)]
# Example:
# arr = np.arange(10).reshape(5, 2)
# print(expwins(arr, min_periods=3)[1]) # the 2nd window of the set
# array([[0, 1],
# [2, 3],
# [4, 5],
# [6, 7]])
min_periods = 21
# Create "blocks" of expanding windows
wins = expwins(df.values, min_periods=min_periods)
# Calculate a beta (single scalar val.) for each
betas = [beta(win[:, 0], win[:, 1]) for win in wins]
betas = pd.Series(betas, index=df.index[min_periods - 1:])
print(betas)
Date
2010-02-03 0.77572
2010-02-04 0.74769
2010-02-05 0.76692
2010-02-08 0.74301
2010-02-09 0.74741
2010-02-10 0.74635
2010-02-11 0.74735
2010-02-12 0.74605
2010-02-16 0.78521
2010-02-17 0.77619
2010-02-18 0.79188
2010-02-19 0.78952
2017-06-19 0.97387
2017-06-20 0.97390
2017-06-21 0.97386
2017-06-22 0.97387
2017-06-23 0.97391
2017-06-26 0.97389
2017-06-27 0.97482
2017-06-28 0.97508
2017-06-29 0.97594
2017-06-30 0.97584
2017-07-03 0.97575
2017-07-05 0.97588
dtype: float64
source to share