Using pandas pd.cut to generate categorical variable with statsmodels

Question

Using pandas pd.cut to generate categorical variable with statsmodels

I tried using pd.cut to create a categorical variable from a continuous variable. I would like to use this in subsequent statistical regression, including this dummy variable. When I create a categorical variable created this way, I get the error

TypeError: data type not understood.

Below is a test case.

import numpy as np
import pandas as pd
import statsmodels as sm
import statsmodels.formula.api as smf
df = pd.DataFrame(np.random.randn(6,4))
df.columns = ['A', 'B', 'C', 'D']
df['ttt']=pd.cut(df['D'], bins=2)
test = smf.ols('A ~ B + ttt', data=df).fit()

I'm pretty sure I did something clearly wrong. Any help would be appreciated.

+3

python pandas statsmodels categorical-data

Tim Beatty 12 nov. 14 at 4:29

source to share

1 answer

Marius · Accepted Answer · 2014-11-12T05:02:03+0000

I'm not sure where exactly the statsmodels are, including the new type support Categorical

in pandas. At this point, you might have to convert the categorical back to an object type for it to work (please check that the resulting ols is appropriate, I don't know the full details of what you are trying to do):

df['ttt_fixed'] = df.ttt.astype(np.object)
test = smf.ols('A ~ B + ttt_fixed', data=df).fit()
test.summary()

Using pandas pd.cut to generate categorical variable with statsmodels

More articles: