Play trimming in xgboost.train () using XGBClassifier ()

Question

Play trimming in xgboost.train () using XGBClassifier ()

I got xgboost to make good predictions using xgboost.train ().

X_train, X_test, y_train, y_test = train_test_split(X_resampled, y_resampled, test_size=.6)
xgtrain = xgb.DMatrix(X_train, y_train)

param = {'max_depth':7, 'silent':1}
bst = xgb.train(param, xgtrain, num_boost_round=2)
y_pred = bst.predict(xgtest)
y_pred = [1. if y_cont > .28  else 0. for y_cont in y_pred]
y_true = y_test

This approach did not give good results (I am trying to maximize the f1 score) until I realized that the f1 score increases significantly when setting the threshold on the outputs. This threshold turned out to be .28. Here are some of the predictions before I set cropping and converted to 0s and 1s:

[ 0.25447303  0.25383738  0.24621713 ...,  0.24621713  0.24621713 0.24621713]

But now I want to tweak my options (using GridSearchCV ()), which means I will need to reproduce what I did in xgboost.train () above using XGBClassifier ().

I understand that things can get tricky because the target function (default) in xgboost.train () is none, and for XGBClassifier () it is "binary: logistics". XGBClassifier () returns a class, not a probability, which is useful in most cases, but not here. I tried scan_proba () with XGBClassifier () and then set the clipping, but it seemed pretty useless since the probabilities I was getting were very close to 0 and 1:

[[  9.99445975e-01   5.54045662e-04]
 [  9.89062011e-01   1.09380139e-02]
 [  9.95234787e-01   4.76523908e-03]

How can I complete the code below, is equivalent to xgboost.train () but with an XGBClassifier? When I tried the XGBClassifier without clipping, I would get a terrible f1 rating.

X_train, X_test, y_train, y_test = train_test_split(X_resampled, y_resampled, test_size=.6)
rf = XGBClassifier(max_depth=7, learning_rate=0.1, n_estimators=100, silent=True, objective='binary:logistic', nthread=-1, gamma=0, min_child_weight=1, max_delta_step=0, subsample=1, colsample_bytree=1, colsample_bylevel=1, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, base_score=0.5, seed=0, missing=None)
rf = rf.fit(X_train, y_train)

+3

python scikit-learn xgboost

Nate Apr 14. 17 at 16:45

source to share