Is Latent Semantic Indexing (LSI) a statistical classification algorithm?

Question

Is Latent Semantic Indexing (LSI) a statistical classification algorithm?

Is Latent Semantic Indexing (LSI) a statistical classification algorithm? Why or why not?

Basically, I'm trying to understand why the Wikipedia page for statistical classification doesn't mention LSI. I'm just doing this and I'm trying to figure out how all the different approaches to classifying something relate to each other.

+2

algorithm classification information-retrieval semantic-web

Nate cook 27 oct. 09 at 22:40

source to share

4 answers

LSI / LSA is ultimately a dimensionality reduction technique and is usually combined with a nearest neighbor algorithm to turn it into a classification system. This in itself is the only way to "index" data in a lower dimension using SVD.

+3

Amro 27 oct. 09:15 pm

source to share

Have you read about LSI on Wikipedia ? He says he uses matrix factorization ( SVD ), which in turn is sometimes used in classification.

+1

csl 27 oct. 09 at 22:46

source to share

The main difference between machine learning is "supervised" and "unsupervised" modeling.

Usually the words "statistical classification" refer to controlled models, but not always.

With supervised methods, the training set contains a ground-truth label in which you build the model for prediction. When you are evaluating a model, the goal is to predict the best guess (or probability distribution) of a true label that you will not have at the time of evaluation. There is often a performance metric and it is very clear what the right or wrong answer is.

Unsupervised classification methods attempt to group a large number of data points that may seem complex for different types into fewer “similar” categories. The data in each category should be similar in some "interesting" or "deep" way. Since there is no "fundamental truth", you cannot judge "right or wrong" but "more" versus "less" interesting or useful.

For a similar evaluation time, you can put new examples in potentially one of the clusters (clear classification) or give some kind of weighting factor that determines how similar or similar to each other the "archetype" of the cluster.

So, in a sense, supervised and unsupervised models may give something that is "prediction", a prediction of the class / cluster label, but they are inherently different.

Often, the goal of an unsupervised model is to provide smarter and more powerful compact inputs for the subsequent supervised model.

+1

Matt kennel 28 oct. '09 at 2:13

source to share

Jerry coffin · Accepted Answer · 2009-10-27T23:00:44+0000

No, they are not exactly the same. Statistical classification is designed to categorize items as cleanly as possible - to make a clean decision about whether item X is more similar to items in group A or group B. For example:

LSI is designed to show the degree of similarity or difference between items and, above all, to find items that show the degree of similarity to a particular item. While it looks like it is not exactly the same.

Is Latent Semantic Indexing (LSI) a statistical classification algorithm?

More articles: