Is the train / test split in unsupervised learning necessary / useful?

Question

Is the train / test split in unsupervised learning necessary / useful?

In supervised learning, I have a typical train / test split to learn an algorithm for example. Regression or classification. Regarding unsupervised learning, my question is, is collection / test necessary and useful? If so, why?

+3

unsupervised-learning machine-learning

Christoph S Jul 28 15 at 10:14

source to share

1 answer

Mangesh divate · Answer 1 · 2017-12-06T19:52:25+0000

Well it depends on the problem, the form of the dataset, and the class of unsupervised algorithm used to solve the particular problem.

Roughly: - Dimension reduction methods are usually tested by calculating the error in the reconstruction, so we can use the k-fold cross-validation routine

But on the clustering algorithm, I would suggest doing statistical testing to check performance. There is also a little time consuming trick that splits the dataset and labels the test case with meaningful classes and cross validates

Anyway an unsupervised algorithm is used for controlled data, then it always cross-checks well

in general: - No need to split the data in the train test suite, but if we can do it it is always better

Here is an article that explains how cross validation is a good tool for unsupervised learning http://udini.proquest.com/view/cross-validation-for-unsupervised-pqid:1904931481/ and the full text is available here http: // arxiv .org / pdf / 0909.3052.pdf

https: ///www.researchgate.net/post/Which_are_the_methods_to_validate_an_unsupervised_machine_learning_algorithm

Is the train / test split in unsupervised learning necessary / useful?

More articles: