Python: DBSCAN in 3D

Question

Python: DBSCAN in 3D

I've been looking for a DBSCAN implementation for 3D points without much luck. Does anyone know my library that deals with this, or has any experience with this? My guess is that the DBSCAN algorithm can handle 3 dimensions, given the e value is the radius exponent and the distance between the points measured by euclidean separation. If anyone has tried to implement this and would like to share it, would be very grateful too, thanks.

+3

python cluster-analysis dbscan

user2909415 07 oct. 14 at 21:59

source to share

3 answers

You can use sklearn for DBSCAN. Here is the code that works for me -

from sklearn.cluster import DBSCAN
import numpy as np
data = np.random.rand(500,3)

db = DBSCAN(eps=0.12, min_samples=1).fit(data)
labels = db.labels_
from collections import Counter
Counter(labels)

The output I got is

Counter({1: 342, 10: 30, 31: 13, 13: 11, 30: 10, 24: 5, 29: 5, 2: 4, 18: 4,
19: 4, 28: 4, 49: 4, 3: 3, 17: 3, 23: 3, 32: 3, 7: 2, 9: 2, 12: 2, 14: 2, 15: 2,
16: 2, 20: 2, 21: 2, 26: 2, 39: 2, 41: 2, 46: 2, 0: 1, 4: 1, 5: 1, 6: 1, 8: 1, 11:
1, 22: 1, 25: 1, 27: 1, 33: 1, 34: 1, 35: 1, 36: 1, 37: 1, 38: 1, 40: 1, 42: 1,
43: 1, 44: 1, 45: 1, 47: 1, 48: 1, 50: 1, 51: 1, 52: 1, 53: 1, 54: 1, 55: 1})

Thus, clustering identifies 55 clusters, counting the number of points in each cluster as shown above.

+2

mayTheCodeBeWithYou Feb 14 '15 at 3:23

source to share

After working with the code given in the first answer, I already finished it and it has serious problems: 1) Noise points can appear in subsequent clusters. 2) it generates additional clusters, which are subsets of previously built clusters due to problems with accounting for visited and unexplored points, resulting in clusters with less than min_points, and 3) some points may end up in two clusters - they are accessible from both clusters , and in this code can even be the main point for one of the clusters. The official DBSCAN algorithm places any point that is a base point in a cluster at which it is part of the core, but puts points that are reachable from only two clusters in the first cluster from which they are accessible. This causes the clustering of these points to depend on the order of the points in the data,but all points appear only once in the output - either in the same cluster or as noise. Some applications need these shared points, which are available from two clusters to be placed in both clusters, but the major points should only appear in one cluster.

So this is me that I am facing. It calculates the separation distance between two points twice and does not use any trees, but immediately eliminates points with no nearest neighbors and creates a list of major points, so only when these kernels need to be built, only those points need to be considered. It uses suites to test inclusion.Note that this implementation puts common points across all clusters to which they are accessible, from

 class DBSCAN(object):
    def __init__(self, eps=0, min_points=2):
        self.eps = eps
        self.min_points = min_points
        self.noise = []
        self.clusters = []
        self.dp = []
        self.near_neighbours = []
        self.wp = set()
        self.proto_cores = set()

    def cluster(self, points):
        c = 0
        self.dp = points
        self.near_neighbours = self.nn(points)
        while self.proto_cores:
            near_points = set(self.proto_cores)
            for p in near_points:
                c += 1
                core = self.add_core(self.near_neighbours[p])
                complete_cluster = self.expand_cluster(core)
                self.clusters.append(["Cluster: %d" % c, complete_cluster])
                self.proto_cores -= core
                break
        for pt in self.dp:
            flag = True
            for c in self.clusters:
                if pt in c[1]:
                    flag = False
            if flag:
                self.noise.append(pt)

    # set up dictionary of near neighbours,create working_point and proto_core sets
    def nn(self, points):
        self.wp = set()
        self.proto_cores = set()
        i = -1
        near_neighbours = {}
        for p in points:
            i += 1
            j = -1
            neighbours = []
            for q in points:
                j += 1
                distance = (((q[0] - p[0]) ** 2 + (q[1] - p[1]) ** 2
                             + (q[2] - p[2]) ** 2) ** 0.5)
                if distance <= self.eps:
                    neighbours.append(j)
            near_neighbours[i] = neighbours
            if len(near_neighbours[i]) > 1:
                self.wp |= {i}
            if len(near_neighbours[i]) >= self.min_points:
                self.proto_cores |= {i}
        return near_neighbours

    # add cluster core points
    def add_core(self, neighbours):
        core_points = set(neighbours)
        visited = set()
        while neighbours:
            points = set(neighbours)
            neighbours = set()
            for p in points:
                visited |= {p}
                if len(self.near_neighbours[p]) >= self.min_points:
                    core_points |= set(self.near_neighbours[p])
                    neighbours |= set(self.near_neighbours[p])
            neighbours -= visited
        return core_points

    # expand cluster to reachable points and rebuild actual point values
    def expand_cluster(self, core):
        core_points = set(core)
        full_cluster = []
        for p in core_points:
            core |= set(self.near_neighbours[p])
        for point_number in core:
            full_cluster.append(self.dp[point_number])
        return full_cluster

0

Pmason 05 May '17 at 22:09

source to share

user2909415 · Accepted Answer · 2014-10-08T20:10:08+0000

So this is what I came up with, I know this is not the most efficient implementation, but it works; for example, the area query, which is the main user of the algorithm, calculates the distance between two points more than once, instead of just storing it for later use.

class DBSCAN(object):

def __init__(self, eps=0, min_points=2):
    self.eps = eps
    self.min_points = min_points
    self.visited = []
    self.noise = []
    self.clusters = []
    self.dp = []

def cluster(self, data_points):
    self.visited = []
    self.dp = data_points
    c = 0
    for point in data_points:
        if point not in self.visited:
            self.visited.append(point)
            neighbours = self.region_query(point)
            if len(neighbours) < self.min_points:
                self.noise.append(point)
            else:
                c += 1
                self.expand_cluster(c, neighbours)

def expand_cluster(self, cluster_number, p_neighbours):
    cluster = ("Cluster: %d" % cluster_number, [])
    self.clusters.append(cluster)
    new_points = p_neighbours
    while new_points:
        new_points = self.pool(cluster, new_points)

def region_query(self, p):
    result = []
    for d in self.dp:
        distance = (((d[0] - p[0])**2 + (d[1] - p[1])**2 + (d[2] - p[2])**2)**0.5)
        if distance <= self.eps:
            result.append(d)
    return result

def pool(self, cluster, p_neighbours):
    new_neighbours = []
    for n in p_neighbours:
        if n not in self.visited:
            self.visited.append(n)
            n_neighbours = self.region_query(n)
            if len(n_neighbours) >= self.min_points:
                new_neighbours = self.unexplored(p_neighbours, n_neighbours)
        for c in self.clusters:
            if n not in c[1] and n not in cluster[1]:
                cluster[1].append(n)
    return new_neighbours

@staticmethod
def unexplored(x, y):
    z = []
    for p in y:
        if p not in x:
            z.append(p)
    return z

Python: DBSCAN in 3D

More articles: