Grid-search-cross-validation in sklearn
Can I use grid cross validation to extract the best parameters with a decision tree classifier? http://scikit-learn.org/stable/tutorial/statistical_inference/model_selection.html
+5
Borys
source
to share
3 answers
Why not?
I invite you to check the GridsearchCV documentation .
example
from sklearn.tree import DecisionTreeClassifier from sklearn.metrics import roc_auc_score param_grid = {'max_depth': np.arange(3, 10)} tree = GridSearchCV(DecisionTreeClassifier(), param_grid) tree.fit(xtrain, ytrain) tree_preds = tree.predict_proba(xtest)[:, 1] tree_performance = roc_auc_score(ytest, tree_preds) print 'DecisionTree: Area under the ROC curve = {}'.format(tree_performance)
And to extract the best parameters:
tree.best_params_
Out[1]: {'max_depth': 5}
+8
gowithefloww
source
to share
This answer is great.
However, I would like to point out a couple of links where it explains how to generalize very well the selection of the best options and with examples:
- http://www.codiply.com/blog/hyperparameter-grid-search-across-multiple-models-in-scikit-learn/
- A similar version with ipython on github: https://github.com/codiply/blog-ipython-notebooks/blob/master/scikit-learn-estimator-selection-helper .
0
Rafael Valero
source
to share
Here is the code for the Grid Search decision tree
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV
def dtree_grid_search(X,y,nfolds):
#create a dictionary of all values we want to test
param_grid = { 'criterion':['gini','entropy'],'max_depth': np.arange(3, 15)}
# decision tree model
dtree_model=DecisionTreeClassifier()
#use gridsearch to test all values
dtree_gscv = GridSearchCV(dtree_model, param_grid, cv=nfolds)
#fit model to data
dtree_gscv.fit(X, y)
return dtree_gscv.best_params_
0
Avinash
source
to share