Clustering data without input parameters

This is more of a theoretical question:

Do you know of any clustering algorithm (flat or hierarchical) that does not require any input parameters such as the number of clusters or the size of the neighborhood, etc.? in other words, you just feed your data to the algorithm as input and get clusters as output.

I would be glad if you were informed about the relevant documents / documentation.

+3


source to share


2 answers


Determining the number of clusters automatically is a serious problem, which is still considered an open research problem.

One of the most modern clustering techniques is modeling your data as a Dirichlet Process Mixture, see Bayesian Hierarchical Clustering , but this is not trivial and requires a solid background in Bayesian methods and Monte Carlo Markov Chain Estimation (MCMC).



This method can automatically estimate the number of clusters.

+2


source


Usually, the answer comes up when you define what you mean by clustering. This is the hard part.

With real values, I like to use average shift with automatic h selection. The clusters correspond to the modes on the data density plot, and the clustering result is similar to the watershed transformation.



http://en.wikipedia.org/wiki/Mean-shift
http://en.wikipedia.org/wiki/Kernel_density_estimation
http://en.wikipedia.org/wiki/Multivariate_kernel_density_estimation

0


source







All Articles