Is the train / test split in unsupervised learning necessary / useful?

In supervised learning, I have a typical train / test split to learn an algorithm for example. Regression or classification. Regarding unsupervised learning, my question is, is collection / test necessary and useful? If so, why?

+3


source to share


1 answer


Well it depends on the problem, the form of the dataset, and the class of unsupervised algorithm used to solve the particular problem.

Roughly: - Dimension reduction methods are usually tested by calculating the error in the reconstruction, so we can use the k-fold cross-validation routine

But on the clustering algorithm, I would suggest doing statistical testing to check performance. There is also a little time consuming trick that splits the dataset and labels the test case with meaningful classes and cross validates

Anyway an unsupervised algorithm is used for controlled data, then it always cross-checks well



in general: - No need to split the data in the train test suite, but if we can do it it is always better

Here is an article that explains how cross validation is a good tool for unsupervised learning http://udini.proquest.com/view/cross-validation-for-unsupervised-pqid:1904931481/ and the full text is available here http: // arxiv .org / pdf / 0909.3052.pdf

https: ///www.researchgate.net/post/Which_are_the_methods_to_validate_an_unsupervised_machine_learning_algorithm

0


source







All Articles