Sklearn pipeline function names: no error set

Question

Sklearn pipeline function names: no error set

I am working with scikit studying a text classification experiment. Now I would like to get the names of the most efficient, selected functions. I tried some answers to similar questions but nothing seems to work. The last lines of code are an example of what I was trying. For example, when I type feature_names

, I get this error: sklearn.exceptions.NotFittedError: This SelectKBest instance is not fitted yet. Call 'fit' with appropriate arguments before using this method.

Any solutions?

scaler = StandardScaler(with_mean=False) 

enc = LabelEncoder()
y = enc.fit_transform(labels)

feat_sel = SelectKBest(mutual_info_classif, k=200)  
clf = linear_model.LogisticRegression()

pipe = Pipeline([('vectorizer', DictVectorizer()),
                 ('scaler', StandardScaler(with_mean=False)),
                 ('mutual_info', feat_sel),
                 ('logistregress', clf)])

feature_names = pipe.named_steps['mutual_info']
X.columns[features.transform(np.arange(len(X.columns)))]

+3

python scikit-learn names feature-selection

Bambi 23 jul. 17 at 16:19

source to share

1 answer

seralouk · Accepted Answer · 2017-07-23T17:32:46+0000

First you need to install the pipeline and then call feature_names

:

Decision

scaler = StandardScaler(with_mean=False) 

enc = LabelEncoder()
y = enc.fit_transform(labels)

feat_sel = SelectKBest(mutual_info_classif, k=200)  
clf = linear_model.LogisticRegression()

pipe = Pipeline([('vectorizer', DictVectorizer()),
                 ('scaler', StandardScaler(with_mean=False)),
                 ('mutual_info', feat_sel),
                 ('logistregress', clf)])

# Now fit the pipeline using your data
pipe.fit(X, y)

#now can the pipe.named_steps
feature_names = pipe.named_steps['mutual_info']
X.columns[features.transform(np.arange(len(X.columns)))]

general information

From the documentation an example here you can see

anova_svm.set_params(anova__k=10, svc__C=.1).fit(X, y)

This sets some initial options (k option for anova option and C for svc option)

and then calls fit(X,y)

to match the pipeline.

EDIT

for a new error, since your X is a list of dictionaries, I see one way to call the column method you want. This can be done using pandas.

X= [{'age': 10, 'name': 'Tom'}, {'age': 5, 'name': 'Mark'}]

df = DataFrame(X) 
len(df.columns)

result:

Hope it helps

Sklearn pipeline function names: no error set

More articles: