Sklearn class_report with input from pandas dataframe prduces: "TypeError: not all arguments converted during string formatting"

I am trying to run sklearn.metrics.classification_report when my data is in Pandas framework. The df_joined dataframe looks like this and has 100 rows:

Timestamp    Label       Pred
2016-10-05   29.75  30.781430
2016-10-06   30.35  31.379146
2016-10-07   31.59  31.174824
2017-02-13   29.63  29.875497
2017-02-14   29.60  29.923161
2017-02-15   30.22  30.257284
2017-02-16   30.12  30.374257
2017-02-17   30.09  30.357196
2017-02-20   31.03  30.971070
2017-02-21   31.05  30.930189

      

Now I am trying to print classification_report to

print 'Classification Report:', '\n', sklearn.metrics.classification_report(df_joined[label],df_joined['Pred'] )

      

and I get the error:

File "\ Python \ WinPython-32bit-2.7.10.3 \ python-2.7.10 \ lib \ site-packages \ sklearn \ utils \ multiclass.py", line 106, in unique_labels raise ValueError ("Unknown label type:% r" % ys)

TypeError: not all arguments converted during string formatting

I'm trying to use sklearn.metrics.classification_report(df_joined[label].values, df_joined['Pred'].values)

this instead, but it produces the same error.

Does anyone have an idea where this is coming from?

+3


source to share


2 answers


I suppose it classification_report

determines how well you classified / predicted the label of the data point, not its actual value. The label cannot be floated, all examples in the sklearn documentation and sklearn user guide use integers for their labels.

The parameters hint at this as well, since the alternative to passing a 1-dimensional array is the label-only specific array construct.

sklearn.metrics.classification_report(y_true, y_pred, labels=None,target_names=None, sample_weight=None, digits=2)

y_true : 1d array-like, or label indicator array / sparse matrix

    Ground truth (correct) target values.

y_pred : 1d array-like, or label indicator array / sparse matrix

    Estimated targets as returned by a classifier.

...

      



If your data were integer labels, then the exact data format you passed would work fine:

# Does not raise an error 
classification_report(df_joined['Label'].astype(int), df_joined['Pred'].astype(int))

      

You can learn more about the different sklearn model scoring tools at Model Scoring: Quantifying the Quality of Predictions and choose the one that suits your classifier to evaluate.

+5


source


What happens if you choose them as list

?

those.



print 'Classification Report:', '\n', sklearn.metrics.classification_report(df_joined['Label'].tolist(),df_joined['Pred'].tolist() )

+2


source







All Articles