Combinations of functions

I have a list of feature sets (40 features) and my idea in the first place was to evaluate the classifier on all the combinations I can get. However, after some calculations, I found that the combinations would reach millions! So it will take forever !!!!

I read about the possibility of using the random search method to select random functions. However, every time I run a random search, I have the same feature sets. Do I need to change the number of seeds or any option?

In addition, efficient random search is used and can replace the all-combination selection approach

I would appreciate your expert advice.

Thank you very much in advance,

Ahmad

+3


source to share


1 answer


If you want to do an attribute selection in WEKA, yo has to consider 2 algorithms, a finder and an attribute evaluator (I'll cover that later).

As you said, you might not be able to try Exhaustive search

because it takes a long time, there are greedy alternatives for getting good results (depending on the problem) such as Best first

(based on hill climbing

). The option you comment ( Random search

) is another approach for creating subsets of the selection, it makes random iterations to select the subsets to be evaluated.

Why are you getting the same subset of the selected attributes? Because Random search

always selects the same subsets and the evaluator determines the best (final output). But if I change the parameter seed

, it should change. Maybe, maybe not. What for? Because if the algorithm does a sufficient number of iterations (although it starts with a different seed), it will get the same subsets as the previous one (convergence), and the evaluator will choose the same subset as the previous execution.



If you don't want to get convergence in the output of the selector, just change it seed

, but choose a smaller search percent

one to limit the search and get different results.

But in my opinion, if you always get the same results, it is because the evaluator (I do not know which algorithm you are using) determined that this subset is the "best" given your dataset. I also recommend that you try another type selector Best first

or Genetic search

as a search method.

+4


source







All Articles