Using pandas pd.cut to generate categorical variable with statsmodels
I tried using pd.cut to create a categorical variable from a continuous variable. I would like to use this in subsequent statistical regression, including this dummy variable. When I create a categorical variable created this way, I get the error
TypeError: data type not understood.
Below is a test case.
import numpy as np
import pandas as pd
import statsmodels as sm
import statsmodels.formula.api as smf
df = pd.DataFrame(np.random.randn(6,4))
df.columns = ['A', 'B', 'C', 'D']
df['ttt']=pd.cut(df['D'], bins=2)
test = smf.ols('A ~ B + ttt', data=df).fit()
I'm pretty sure I did something clearly wrong. Any help would be appreciated.
+3
source to share
1 answer
I'm not sure where exactly the statsmodels are, including the new type support Categorical
in pandas. At this point, you might have to convert the categorical back to an object type for it to work (please check that the resulting ols is appropriate, I don't know the full details of what you are trying to do):
df['ttt_fixed'] = df.ttt.astype(np.object)
test = smf.ols('A ~ B + ttt_fixed', data=df).fit()
test.summary()
+3
source to share