Inter-cluster and intra-cluster distances
I found the following formulas for inter-cluster and intra-cluster distances and I'm not sure if I understand how they work.
Distance between clusters
Shouldn't there be a square root in the formulas above?
Inter-cluster and intra-cluster:
Why is there an index j starting with N + 1? And not from 1 to N2?
Which one is correct? Or are there any equivalents? Or should I go for centroid spacing for cluster spacing? Seems pretty straightforward. How about the distance within the cluster?
I find formulas wikipedia http://en.wikipedia.org/wiki/Cluster_analysis#Internal_evaluation even harder to understand.
I need to calculate these distances for the correct group color in order to create a scaled down color palette, so I think the more precise these distances are, the more accurate the grouping (formula instead of centroid distance for inter-cluster). Vectors are 3-dimensional (RGB components).
source to share
Many algorithms don't really use "distance".
k-means, for example, minimizes the variance , which is the sum of squares you see here. Now the sum of squares is the squared Euclidean distance, so it can be argued that this algorithm is also trying to minimize Euclidean distances; but the "natural" formulation of the algorithm does not use Euclidean distances, but sums of squares. if i'm not mistaken then this also applies to ward clustering that you have to compute it using variance and not euclidean distance.
Note that if you minimize z ^ 2 and z cannot be negative, then you have minimized z as well.
source to share