Optimizer tuning in a sklear Gaussian regression process
I am trying to use GaussianProcessRegressor as part of scikit-learn 0.18.1
I train on 200 data points and using 13 input functions for my kernel - one constant multiplied by a radial base function with twelve elements. The model works without complaints, but if I run the same script multiple times, I notice that I sometimes get different solutions. It might be worth noting that some of the optimized parameters are running into the framework I gave them (I am currently developing which features matter).
I tried to increase the "n_restarts_optimizer" parameter to 50, and although it takes significantly longer, it does not remove the element of seeming randomness. It seems possible to change the optimizer itself, although I had no luck. From a quick scan, it seems most similar syntactically scipy fmin_tnc and fmin_slsqp (other optimizers don't include estimates). However, using any of them causes other problems: for example, fmin_tnc does not return the value of the minimum objective function.
Are there any suggestions on how to have a more deterministic script? Ideally, I would like it to print the same values ββregardless of iteration, because since it stands, it looks a bit like a lottery (and therefore any conclusions might be questionable).
A snippet of code that I am using:
from sklearn.gaussian_process import GaussianProcessRegressor as GPR
from sklearn.gaussian_process.kernels import RBF, ConstantKernel as C
lbound = 1e-2
rbound = 1e1
n_restarts = 50
n_features = 12 # Actually determined elsewhere in the code
kernel = C(1.0, (lbound,rbound)) * RBF(n_features*[10], (lbound,rbound))
gp = GPR(kernel=kernel, n_restarts_optimizer=n_restarts)
gp.fit(train_input, train_outputs)
test_model, sigma2_pred = gp.predict(test_input, return_std=True)
print gp.kernel_
source to share
Here's where random values ββare used to initialize the optimization :
Since LML can have multiple local optimizations, the optimizer can be multiple times by specifying n_restarts_optimizer.
As far as I understand, there will always be a random factor. Sometimes it will find the local lows you mentioned.
If your data allows it (an invertible X matrix), you can use the normal equation if it suits your needs, without the random factor.
You can sample (sort of like a random forest) on top of this, where you run this algorithm multiple times and choose the best fit or overall value: you have to weigh the consistency and precision.
I hope I understood your question correctly.
source to share