Is the train / test split in unsupervised learning necessary / useful?
Well it depends on the problem, the form of the dataset, and the class of unsupervised algorithm used to solve the particular problem.
Roughly: - Dimension reduction methods are usually tested by calculating the error in the reconstruction, so we can use the k-fold cross-validation routine
But on the clustering algorithm, I would suggest doing statistical testing to check performance. There is also a little time consuming trick that splits the dataset and labels the test case with meaningful classes and cross validates
Anyway an unsupervised algorithm is used for controlled data, then it always cross-checks well
in general: - No need to split the data in the train test suite, but if we can do it it is always better
Here is an article that explains how cross validation is a good tool for unsupervised learning http://udini.proquest.com/view/cross-validation-for-unsupervised-pqid:1904931481/ and the full text is available here http: // arxiv .org / pdf / 0909.3052.pdf
source to share