Clustering data without input parameters

Question

Clustering data without input parameters

This is more of a theoretical question:

Do you know of any clustering algorithm (flat or hierarchical) that does not require any input parameters such as the number of clusters or the size of the neighborhood, etc.? in other words, you just feed your data to the algorithm as input and get clusters as output.

I would be glad if you were informed about the relevant documents / documentation.

+3

parameters machine-learning hierarchical-clustering

Alina 07 Feb 13 at 16:43

source to share

2 answers

iTech · Answer 1 · 2013-02-08T04:09:57+0000

Determining the number of clusters automatically is a serious problem, which is still considered an open research problem.

One of the most modern clustering techniques is modeling your data as a Dirichlet Process Mixture, see Bayesian Hierarchical Clustering , but this is not trivial and requires a solid background in Bayesian methods and Monte Carlo Markov Chain Estimation (MCMC).

This method can automatically estimate the number of clusters.

Don reba · Answer 2 · 2013-02-07T18:39:01+0000

Usually, the answer comes up when you define what you mean by clustering. This is the hard part.

With real values, I like to use average shift with automatic h selection. The clusters correspond to the modes on the data density plot, and the clustering result is similar to the watershed transformation.

http://en.wikipedia.org/wiki/Mean-shift
http://en.wikipedia.org/wiki/Kernel_density_estimation
http://en.wikipedia.org/wiki/Multivariate_kernel_density_estimation

Clustering data without input parameters

More articles: