Learning Python machines, choosing functions

I am working on a classification problem related to written text, and I am wondering how important it is to perform some sort of "object selection" procedure to improve the classification results.

I use a number of functions (about 40) related to an object, but I'm not sure if all the functions are really relevant or not, and in what combinations. I am experimenting with SVM (scikits) and LDAC (mlpy).

If you have a mix of relevant and irrelevant features, I guess I am getting poor classification results. Do I have to follow the "Feature Selection Procedure" before classification?

Scikits has an RFE procedure based on a tree structure that can evaluate functions . Does it make sense to rank features using a tree RFE in order to select the most important features and do the actual classification using SVM (non-linear) or LDAC? Or should I implement some kind of wrapper method using the same classifier to rank features (trying to classify using different feature groups would be very time consuming)?


source to share

2 answers

Just try to see if it improves the grading score as measured by cross validation . Also, before trying RFE, I would try less CPU intensive circuits like the chi2 one-dimensional feature set .



Having 40 features isn't too bad. Some machine learning is hampered by irrelevant features, but many things are robust enough for them (e.g. naive Bayesian, SVM, decision trees). You probably don't need to make feature choices unless you decide to add many more features.

It's not a bad idea to throw away useless functions, but don't waste your own mental time trying it if you don't have a lot of motivation.



All Articles