How important are logistic regression data labels?

I've studied logistic regression for a few days, and I think the labels of the logistic regression dataset should be 1 or 0, right?

But when I go through the libSVM library regression dataset , I see that the label values ​​continue number (eg 1.0086,1.0089 ...) am I missing something?

Please note that libSVM library can be used for regression problem.

Many thanks!

+3


source to share


3 answers


Unlike its name, Logistic Regression is a classification algorithm and outputs the probability of a class due to a data point. Therefore, the training set labels should be either 0 or 1. Logistic regression is not a suitable algorithm for the mentioned dataset.



SVM is a classification algorithm and uses input labels -1 or 1. It is not a probabilistic algorithm and it does not output the class probabilities. It can also be adapted for regression.

+2


source


Are you using a third party library or are you programming yourself? Typically, these labels are used as true, so you can see how effective your approach is.



For example, if your algorithm tries to predict that a particular instance might output -1, the earth's true label will be +1, which means that you were unable to successfully classify that particular instance.

0


source


Note that "regression" is a general term. Saying that someone will perform regression analysis does not necessarily tell you which algorithm they will use, and not all characteristics of the datasets. All this really tells you that you have a set of samples with functions that you want to use to predict a single outcome value (conditional probability models).

One significant difference between logistic regression and linear regression is that the former is usually trained on categorical, binary-labeled sets of patterns; while the latter is trained on a set of samples with a real label (ℝ).

Anytime your labels are actually rated, that means you are probably going to use linear regression or similar information, or convert those real iconic labels to categorical labels (like through rapids or bins) if you really want to use logistics regression. There may be a big difference in the quality and interpretation of your results, though, if you try to convert from one such problem to another.

See also Regression Analysis .

0


source







All Articles