Nested GridSearchCV
For a given model type, I want to 1) adjust the parameters for different model types and 2) find the best customized model type. I would like to use GridSearchCV
for this.
I was able to run the following, but I'm also worried that it doesn't work the way I expect it to work, and I'm also worried that maybe you don't need to nest it GridSearchCV
- is it possible to do this using one GridSearchCV
?
One of the problems I run into with a nested GridSearchCV is that I could do nested cross-validation, so instead of looking for a grid on 66% of the train data, it might be efficient to look for a grid on 43.56% of the train data. Another problem I am experiencing is that I have increased the complexity of the code.
Here's my nested example GridSearchCV
using the aperture dataset:
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.decomposition import KernelPCA
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
iris_raw_data = load_iris()
iris_df = pd.DataFrame(np.c_[iris_raw_data.data, iris_raw_data.target],
columns=iris_raw_data.feature_names + ['target'])
iris_category_labels = {0:'setosa', 1:'versicolor', 2:'virginica'}
iris_df['species_name'] = iris_df['target'].apply(lambda l: iris_category_labels[int(l)])
features = ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']
target = 'target'
X_train, X_test, y_train, y_test = train_test_split(iris_df[features], iris_df[target], test_size=.33)
pipe_knn = Pipeline(steps=[
('scaler', StandardScaler()),
('reduce_dim', KernelPCA(n_components=2)), # select feature 2 and 4
('clf', KNeighborsClassifier())
])
params_knn = dict(scaler=[None, StandardScaler()],
reduce_dim=[None, KernelPCA(n_components=2)],
clf__n_neighbors=[2, 5, 15])
grid_search_knn = GridSearchCV(pipe_knn, param_grid=params_knn)
pipe_svc = Pipeline(steps=[
('scaler', StandardScaler()),
('reduce_dim', KernelPCA(n_components=2)), # select feature 2 and 4
('clf', SVC())
])
params_svc = dict(scaler=[None, StandardScaler()],
reduce_dim=[None, KernelPCA(n_components=2)],
clf__C=[0.1, 1, 10, 100])
grid_search_svc = GridSearchCV(pipe_svc, param_grid=params_svc)
pipe_rf = Pipeline(steps=[
('clf', RandomForestClassifier())
])
params_rf = dict(clf__n_estimators=[10, 50, 100],
clf__min_samples_leaf=[2, 5, 10])
grid_search_rf = GridSearchCV(pipe_rf, param_grid=params_rf)
pipe_meta = Pipeline(steps=[('subpipes', pipe_knn)])
params_meta = dict(subpipes=[grid_search_svc, grid_search_knn, grid_search_rf])
grid_search_meta = GridSearchCV(pipe_meta, param_grid=params_meta)
grid_search_meta.fit(X_train, y_train)
print(grid_search_meta.best_estimator_)
source to share
No one has answered this question yet
Check out similar questions: