Is oob_score_ parameter in scikit-learn equal to precision or error?

I have implemented the Random Forest (RF) classifiers from the scikit-learn Python package for an ML problem. In the first step, I used cross-validation to test other algorithms and now my choice is RF.

Later I also checked what the OOB RF estimate tells me. However, when I compare the return in "oob_score_" with my CV results, I have a big discrepancy.

The scikit-learn doc tells me:

oob_score: bool

Use samples outside the bag to assess generalization error.

Because of the doc, I assumed the oob_score_ 'parameter is the error estimate. But looking at the reasons, it also occurred to me that this might actually measure the accuracy. This will be - at least a little closer to my results. I also checked the code and I believe more in accuracy, but I want to be sure ... (in this case I find the document is misleading BTW).

Is oob_score_ in scikit to find out the accuracy or the error estimate?

Thank you in advance

+3


source to share


1 answer


This is similar to the method .score

that returns the precision of the model. It just generalizes to the oob script. The documentation is really a bit missing.

As you can find in the code https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/ensemble/forest.py



for k in range(self.n_outputs_):
            if (predictions[k].sum(axis=1) == 0).any():
                warn("Some inputs do not have OOB scores. "
                     "This probably means too few trees were used "
                     "to compute any reliable oob estimates.")

            decision = (predictions[k] /
                        predictions[k].sum(axis=1)[:, np.newaxis])
            oob_decision_function.append(decision)
            oob_score += np.mean(y[:, k] ==
                                 np.argmax(predictions[k], axis=1), axis=0)

      

It just calculates the average of the correct classifications.

+2


source







All Articles