Scikit-learn SGDClassifier warm start ignored

I am trying to use SGDClassifier from scikit-learn version 0.15.1. There seems to be no way to set convergence criteria other than the number of iterations. So I would like to do it manually, checking the error on each iteration, and then running additional iterations until the improvements are small enough.

Unfortunately, neither the warm_start flag nor the coef_init / intercept_init seem really warm - start optimizing - they both seem to start from scratch.

What should I do? Without a real criterion for convergence or warm start, the classifier is not used.

Notice how the offset increases with each restart, and how the loss also increases, but decreases with further iterations. After 250 iterations, the offset is -3.44 and the average loss is 1.46.

sgd = SGDClassifier(loss='log', alpha=alpha, verbose=1, shuffle=True, 
                    warm_start=True)
print('INITIAL FIT')
sgd.fit(X, y, sample_weight=sample_weight)
sgd.n_iter = 1
print('\nONE MORE ITERATION')
sgd.fit(X, y, sample_weight=sample_weight)
sgd.n_iter = 3
print('\nTHREE MORE ITERATIONS')
sgd.fit(X, y, sample_weight=sample_weight)


INITIAL FIT
-- Epoch 1
Norm: 254.11, NNZs: 92299, Bias: -5.239955, T: 122956, Avg. loss: 28.103236
Total training time: 0.04 seconds.
-- Epoch 2
Norm: 138.81, NNZs: 92598, Bias: -5.180938, T: 245912, Avg. loss: 16.420537
Total training time: 0.08 seconds.
-- Epoch 3
Norm: 100.61, NNZs: 92598, Bias: -5.082776, T: 368868, Avg. loss: 12.240537
Total training time: 0.12 seconds.
-- Epoch 4
Norm: 74.18, NNZs: 92598, Bias: -5.076395, T: 491824, Avg. loss: 9.859404
Total training time: 0.17 seconds.
-- Epoch 5
Norm: 55.57, NNZs: 92598, Bias: -5.072369, T: 614780, Avg. loss: 8.280854
Total training time: 0.21 seconds.

ONE MORE ITERATION
-- Epoch 1
Norm: 243.07, NNZs: 92598, Bias: -11.271497, T: 122956, Avg. loss: 26.148746
Total training time: 0.04 seconds.

THREE MORE ITERATIONS
-- Epoch 1
Norm: 258.70, NNZs: 92598, Bias: -16.058395, T: 122956, Avg. loss: 29.666688
Total training time: 0.04 seconds.
-- Epoch 2
Norm: 142.24, NNZs: 92598, Bias: -15.809559, T: 245912, Avg. loss: 17.435114
Total training time: 0.08 seconds.
-- Epoch 3
Norm: 102.71, NNZs: 92598, Bias: -15.715853, T: 368868, Avg. loss: 12.731181
Total training time: 0.12 seconds.

      

+3


source to share


1 answer


warm_start=True

will use the set coefficients as starting points, but will restart the training schedule.

If you want to manually check for convergence, I suggest using partial_fit

instead fit

as @AdrienNK suggested:



sgd = SGDClassifier(loss='log', alpha=alpha, verbose=1, shuffle=True, 
                warm_start=True, n_iter=1)
sgd.partial_fit(X, y)
# after 1st iteration
sgd.partial_fit(X, y)
# after 2nd iteration
...

      

+6


source







All Articles