Inter-cluster and intra-cluster distances

I found the following formulas for inter-cluster and intra-cluster distances and I'm not sure if I understand how they work.

enter image description here

Distance between clusters

enter image description here

Shouldn't there be a square root in the formulas above?

Inter-cluster and intra-cluster:

enter image description hereenter image description here

Why is there an index j starting with N + 1? And not from 1 to N2?

Which one is correct? Or are there any equivalents? Or should I go for centroid spacing for cluster spacing? Seems pretty straightforward. How about the distance within the cluster?

I find formulas wikipedia http://en.wikipedia.org/wiki/Cluster_analysis#Internal_evaluation even harder to understand.

I need to calculate these distances for the correct group color in order to create a scaled down color palette, so I think the more precise these distances are, the more accurate the grouping (formula instead of centroid distance for inter-cluster). Vectors are 3-dimensional (RGB components).

+3


source to share


1 answer


Many algorithms don't really use "distance".

k-means, for example, minimizes the variance , which is the sum of squares you see here. Now the sum of squares is the squared Euclidean distance, so it can be argued that this algorithm is also trying to minimize Euclidean distances; but the "natural" formulation of the algorithm does not use Euclidean distances, but sums of squares. if i'm not mistaken then this also applies to ward clustering that you have to compute it using variance and not euclidean distance.



Note that if you minimize z ^ 2 and z cannot be negative, then you have minimized z as well.

See also: https://stats.stackexchange.com/questions/95793/is-there-an-advantage-to-squaring-dissimilarities-when-using-ward-clustering

+2


source







All Articles