Machine learning: optimal parameter values โ€‹โ€‹within a reasonable time frame

Sorry if this is a duplicate.

I have a two-class forecasting model; it has n

configurable (numeric) parameters. The model can work very well if you tune these parameters correctly, but specific values โ€‹โ€‹for these parameters are difficult to find. I used a grid search for this (providing, say, m

values โ€‹โ€‹for each parameter). It gives m ^ n

time to learn and it is very time consuming even when running in parallel on a machine with 24 cores.

I tried to fix all parameters except one and changing this only one parameter (which gives m ร— n

times), but it is not obvious to me what to do with the results I got. This is an approximate plot of precision (triangles) and recall (points) for negative (red) and positive (blue) samples:

enter image description here

Simply getting the "winner" values โ€‹โ€‹for each parameter obtained in this way and combining them does not lead to better (or even good) prediction results. I was thinking about creating a regression on parameter sets with precision / recall as the dependent variable, but I don't think regression with more than 5 independent variables will be much faster than a grid lookup scenario.

What would you suggest to find good parameter values, but with reasonable estimation times? Sorry if this has some obvious (or well-documented) answer.

+3


source to share


2 answers


I would use a randomized grid search (pick random values โ€‹โ€‹for each of your parameters in a given range that you think is reasonable, and evaluate each similar randomly selected configuration) that you can use for as long as you can afford it. This article provides some experiments that show that it is at least as good as finding a grid:

Mesh search and manual search are the most widely used hyperparameter optimization strategies. This article shows empirically and theoretically that randomly selected tests are more effective at optimizing hyperparameters than mesh tests. The empirical evidence comes from comparison with large previous research that used grid search and manual search to tune neural networks and deep belief networks. Compared to neural networks configured to search for a clean mesh, we find that a random search on the same domain is able to find good or better models within a fraction of the computation time.



For what it's worth, I used scikit-learn random grid search for a problem requiring about 10 hyper-parameters optimization for a text classification problem, with very good results with only about 1000 iterations.

+2


source


I suggest a Simulated Annealing Simplex Algorithm :



  • Very easy to use. Just give it n + 1 point and let it go to some custom value (either number of iterations or convergence).

  • Implemented in all possible languages.

  • No derivatives required.

  • More robust to local optimal than the method you are currently using.

+1


source







All Articles