Merge Cluster Threshold

Question

Merge Cluster Threshold

I'm working with an average offset, this procedure calculates where each point in the dataset converges. I can also calculate the Euclidean distance between the coordinates at which two different points converge, but I have to give a threshold, if (distance <threshold) then these points belong to the same cluster and I can combine them.

How can I find the correct value to use as a threshold?
(I can use each value and the result depends on it, but I need the optimal value)

+3

cluster-analysis threshold kernel-density

Federico Catalano Jan 24 13 at 19:16

source to share

1 answer

mattnedrich · Answer 1 · 2014-06-24T02:39:47+0000

I have used variable offset clusters several times and ran into this problem. Depending on how many iterations you want to transfer to each point, or what completion criteria, there is usually some post-processing step where you need to group the shifted points into clusters. The points that theoretically go into the same mode should not practically be located directly above each other.

I think the best and most common way to do this is to use a kernel bandwidth based threshold as suggested in the comments. In the past, my code for this post-processing usually looked something like this:

threshold = 0.5 * kernel_bandwidth
clusters = []
for p in shifted_points:
    cluster = findExistingClusterWithinThresholdOfPoint(p, clusters, threshold)
    if cluster == null:
        // create new cluster with p as its first point
        newCluster = [p]
        clusters.add(newCluster)
    else:
        // add p to cluster
        cluster.add(p)

For the function, findExistingClusterWithinThresholdOfPoint

I usually use the minimum distance p

for each current cluster.

This seems to work pretty well. Hope this helps.

Merge Cluster Threshold

More articles: