Is oob_score_ parameter in scikit-learn equal to precision or error?
I have implemented the Random Forest (RF) classifiers from the scikit-learn Python package for an ML problem. In the first step, I used cross-validation to test other algorithms and now my choice is RF.
Later I also checked what the OOB RF estimate tells me. However, when I compare the return in "oob_score_" with my CV results, I have a big discrepancy.
The scikit-learn doc tells me:
oob_score: bool
Use samples outside the bag to assess generalization error.
Because of the doc, I assumed the oob_score_ 'parameter is the error estimate. But looking at the reasons, it also occurred to me that this might actually measure the accuracy. This will be - at least a little closer to my results. I also checked the code and I believe more in accuracy, but I want to be sure ... (in this case I find the document is misleading BTW).
Is oob_score_ in scikit to find out the accuracy or the error estimate?
Thank you in advance
source to share
This is similar to the method .score
that returns the precision of the model. It just generalizes to the oob script. The documentation is really a bit missing.
As you can find in the code https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/ensemble/forest.py
for k in range(self.n_outputs_):
if (predictions[k].sum(axis=1) == 0).any():
warn("Some inputs do not have OOB scores. "
"This probably means too few trees were used "
"to compute any reliable oob estimates.")
decision = (predictions[k] /
predictions[k].sum(axis=1)[:, np.newaxis])
oob_decision_function.append(decision)
oob_score += np.mean(y[:, k] ==
np.argmax(predictions[k], axis=1), axis=0)
It just calculates the average of the correct classifications.
source to share