Merge Cluster Threshold

I'm working with an average offset, this procedure calculates where each point in the dataset converges. I can also calculate the Euclidean distance between the coordinates at which two different points converge, but I have to give a threshold, if (distance <threshold) then these points belong to the same cluster and I can combine them.

How can I find the correct value to use as a threshold?
(I can use each value and the result depends on it, but I need the optimal value)

+3


source to share


1 answer


I have used variable offset clusters several times and ran into this problem. Depending on how many iterations you want to transfer to each point, or what completion criteria, there is usually some post-processing step where you need to group the shifted points into clusters. The points that theoretically go into the same mode should not practically be located directly above each other.

I think the best and most common way to do this is to use a kernel bandwidth based threshold as suggested in the comments. In the past, my code for this post-processing usually looked something like this:

threshold = 0.5 * kernel_bandwidth
clusters = []
for p in shifted_points:
    cluster = findExistingClusterWithinThresholdOfPoint(p, clusters, threshold)
    if cluster == null:
        // create new cluster with p as its first point
        newCluster = [p]
        clusters.add(newCluster)
    else:
        // add p to cluster
        cluster.add(p)

      



For the function, findExistingClusterWithinThresholdOfPoint

I usually use the minimum distance p

for each current cluster.

This seems to work pretty well. Hope this helps.

0


source







All Articles