Scikit-learn kmeans custom distance

I want to use the kmeans algorithm to cluster some data, but I would like to use a custom distance function. Is there a way to change the distance function scikit-learn uses?

I would also settle for another framework / module that would allow the distance function to be exchanged and could compute the parallels in parallel (I would like to speed up the computation, which is a nice feature from scikit-learn)

Any suggestions?

+3


source to share


1 answer


You can try the spectral clustering algorithm, which allows you to enter your own distance matrix (calculated as you like).

Its performance has nothing to envy for K-means on convex boundaries, but it also does work on non-convex problems (detects connectivity). More details here .



The good news is that spectral clustering is also implemented in scikit-learn .

Hope it helps.

+1


source







All Articles