Effective point density for a large number of coordinates

Hello everyone and sorry for the long post

i has an array of 3D coordinates (x, y, z) in a 2D array with size (13720.3). I would like to make a density map of the coordinate points so that I can see in which areas there are many observations and where we never see anything. Note that there is no value associated with coordinates, but if it is easier to work with values ​​associated with coordinates, I could make a grid to span the entire volume and assign 1 observations and 0 where there are no observations.

There is already a good topic here: How to plot a 3D density map in python using matplotlib

but when i run the code below i am facing infinity and / or nan problems

import numpy as np
from os import listdir
from scipy import stats
from mayavi import mlab # will be used for 3d plot when gaussian_kde works

Files = listdir(corrPath)
numHeaders = int(2)

coords = []
for f in Files:
    k = int(0)
    if f[:3] == 'rew':
        fid = open(corrPath+f,'r')
        for line in fid:
            k += 1
            if k > numHeaders:
                 Info = line.split()
                 coords.append(Info[0:3])

    fid.close()


coords = np.array(coords,dtype='float')    
kde = gaussian_kde(coords) # very, very slow - all 8Ggyte of RAM is swallowed - maybe 
                           # multiprocessing will speed it up though - see SO thread

# kde = gaussian_kde(coords.astype(int))  # throws singular matrix error 
                                          # according to np.linalg.cond the condition 
                                          # number is around 4.22

density = kde(coords)

      

which issues warnings

path/python2.7/site-packages/scipy/stats/kde.py:231: 
RuntimeWarning: overflow encountered in exp result = result + exp(-energy)
path/python2.7/site-packages/scipy/stats/kde.py:240: 
RuntimeWarning: divide by zero encountered in true_divide
result = result / self._norm_factor

      

and density

in[16]:  density
Out[16]: array([ inf,  inf,  inf])

      

I have looked through the documentation for histogramdd ( http://docs.scipy.org/doc/numpy/reference/generated/numpy.histogramdd.html ) to see if I can possibly beat data points and thus speed and also avoid problems with matrix inversion. but I cannot get it to work as I would like. This is my code

numBins = 280
binCoords, edges = np.histogramdd(coords,bins=(numBins,numBins,numBins))

      

as far as I can see, it puts each column into its own dataset every time, so I don't think I can use it to speed up stat.gaussian_kde. I see from the thread I linked that multiprocessing can speed up the evaluation - that would be nice, but I'm not really sure how to implement it - I don't understand his last script completely (just if you're wondering why its optimization isn't in my code )

Aside from a feeble attempt to clog data, I'm not sure how to proceed. Any input would be appreciated :)

+3


source to share





All Articles