Statsmodels - different forms of streaming?
I am trying to perform boolean regression on a dataset that contains a target variable that is boolean ("default") and two functions ("fico_interp", "home_ownership_int") using the logit module in statsmodels. All three values are taken from one data frame: traidf:
from sklearn import datasets
import statsmodels.formula.api as smf
lmf = smf.logit('default ~ fico_interp + home_ownership_int',traindf).fit()
Which generates the error message:
ValueError: operands cannot be passed along with shapes (40406,2) (40406,)
How can this happen?
+3
source to share
1 answer
The problem is that it traindf['default']
contains values that are not numeric.
The following code reproduces the error:
import pandas as pd, numpy as np, statsmodels.formula.api as smf
df = pd.DataFrame(np.random.randn(1000,2), columns=list('AB'))
df['C'] = ((df['B'] > 0)*1).apply(str)
lmf = smf.logit('C ~ A', df).fit()
And the following code is a possible way to fix this instance:
df.replace(to_replace={'C' : {'1': 1, '0': 0}}, inplace = True)
lmf = smf.logit('C ~ A', df).fit()
This post reports a similar issue.
+2
source to share