Is oob_score_ parameter in scikit-learn equal to precision or error?

Question

Is oob_score_ parameter in scikit-learn equal to precision or error?

I have implemented the Random Forest (RF) classifiers from the scikit-learn Python package for an ML problem. In the first step, I used cross-validation to test other algorithms and now my choice is RF.

Later I also checked what the OOB RF estimate tells me. However, when I compare the return in "oob_score_" with my CV results, I have a big discrepancy.

The scikit-learn doc tells me:

oob_score: bool

Use samples outside the bag to assess generalization error.

Because of the doc, I assumed the oob_score_ 'parameter is the error estimate. But looking at the reasons, it also occurred to me that this might actually measure the accuracy. This will be - at least a little closer to my results. I also checked the code and I believe more in accuracy, but I want to be sure ... (in this case I find the document is misleading BTW).

Is oob_score_ in scikit to find out the accuracy or the error estimate?

Thank you in advance

+3

python python-2.7 scikit-learn random-forest

no_use123 15 jul. 15 at 18:39

source to share

1 answer

lejlot · Accepted Answer · 2015-07-15T19:15:12+0000

This is similar to the method .score

that returns the precision of the model. It just generalizes to the oob script. The documentation is really a bit missing.

As you can find in the code https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/ensemble/forest.py

for k in range(self.n_outputs_):
            if (predictions[k].sum(axis=1) == 0).any():
                warn("Some inputs do not have OOB scores. "
                     "This probably means too few trees were used "
                     "to compute any reliable oob estimates.")

            decision = (predictions[k] /
                        predictions[k].sum(axis=1)[:, np.newaxis])
            oob_decision_function.append(decision)
            oob_score += np.mean(y[:, k] ==
                                 np.argmax(predictions[k], axis=1), axis=0)

It just calculates the average of the correct classifications.

Is oob_score_ parameter in scikit-learn equal to precision or error?

More articles: