How to copy angular multidimensional data? Distance measures and algorithms

I would like to group a set of multidimensional vectors (n> 10) in which each attribute is an angle. What distance measures and algorithms can I use?

I thought:
- Manhattan distance

- accepting max / min distances between attribute pairs ( http://www.ncbi.nlm.nih.gov/pubmed/9390236 )
- summing angular distances between all attribute pairs

When it comes to distance measures, Euclidean distance seems very natural and intuitive, even for objects located in multidimensional space. However, I haven't found my equivalent for angles.

And algorithms:
- affinity propagation
- dbscan
- in general, scikit-learn algorithms, excluding K-Means. ( http://scikit-learn.org/stable/modules/clustering.html#clustering )

Here are some examples: ['179.5', '58 .8 ', '78 .2', '211.8', '295.6', '194.9', '9.3', '328.3', '40 .9 ',' 323.1 ', '17 .2']
[ '171.4', '74 .9 ', '81 .5', '204.4', '284.1', '193.8', '2.1', '326.7', '49 .3 ',' 310.4 ', '30 .5']
['64 .2 ',' 119.8 ',' 147.2 ',' 213.0 ',' 167.4 ',' 256.4 ',' 349.4 ', '28 .3', '325.6', '29 .6 ',' 348.0 ']
By the way, these numbers are dihedral angles.

+3


source to share


2 answers


Consider mapping an angle to a unit circle. So the distances are close even if the two angles are -pi and pi. This would mean that each vector goes from n-dimensional to (2n) -dimensional.



Then I'll try all the usual distance measurements.

+3


source


If you plan on using k-means you should really map the data in Euclidean space i.e. up sin(angle), cos(angle)

for each corner. The reason is that otherwise the average function does not make sense: the average value of the angles -179

and +179

should be -180

(or +180

), but if done naively, the average value will be 0

that is the reverse!

If you give other algorithms a try such as HAC, PAM, CLARA, DBSCAN, OPTICS, etc., then you can define a custom distance function that handles 360 ° wrap. For example, you can use

min(abs(x-y), 360-abs(x-y))

      



and then calculate their sum or sum of squares.

But this approach doesn't work with k-means!

+2


source







All Articles