Efficient point density for large number of coordinates

841 Views Asked by At

Hello everyon and sorry for the long post

i have an array of 3-dimensional coordinates (x,y,z) in a 2D array with size (13720,3). i would like to make a point density map of the coordinates so i can see which areas are there many observations, and where we never see anything. Note that there is no value associated with coordinates - but if it is easier to work with values associated with coordinates, i could make a grid spanning the entire volume and assign 1 to the observations and 0 to where there are no observations.

There is already as nice thread about this here: How to plot a 3D density map in python with matplotlib

but when i run the code below i run into problems with infinity and/or nan

import numpy as np
from os import listdir
from scipy import stats
from mayavi import mlab # will be used for 3d plot when gaussian_kde works

Files = listdir(corrPath)
numHeaders = int(2)

coords = []
for f in Files:
    k = int(0)
    if f[:3] == 'rew':
        fid = open(corrPath+f,'r')
        for line in fid:
            k += 1
            if k > numHeaders:
                 Info = line.split()
                 coords.append(Info[0:3])

    fid.close()


coords = np.array(coords,dtype='float')    
kde = gaussian_kde(coords) # very, very slow - all 8Ggyte of RAM is swallowed - maybe 
                           # multiprocessing will speed it up though - see SO thread

# kde = gaussian_kde(coords.astype(int))  # throws singular matrix error 
                                          # according to np.linalg.cond the condition 
                                          # number is around 4.22

density = kde(coords)

which throws the warnings

path/python2.7/site-packages/scipy/stats/kde.py:231: 
RuntimeWarning: overflow encountered in exp result = result + exp(-energy)
path/python2.7/site-packages/scipy/stats/kde.py:240: 
RuntimeWarning: divide by zero encountered in true_divide
result = result / self._norm_factor

and the density is

in[16]:  density
Out[16]: array([ inf,  inf,  inf])

i have looked at the documentation for histogramdd ( http://docs.scipy.org/doc/numpy/reference/generated/numpy.histogramdd.html ) to see if i could perhaps bin the data points and that way speed up the calculation and avoid the problems with inverting the matrix. but i cannot get it to work like at want i to. This is my code

numBins = 280
binCoords, edges = np.histogramdd(coords,bins=(numBins,numBins,numBins))

as far as i can see, it bins every column in my data set individually, so i do not think i can use that to speed up stat.gaussian_kde. I can see from the thread i have linked that multiprocessing could speed up the evaluation - that would be nice, but i am not quite sure how to implement it - i do not understand his last script entirely (just if you are wondering why his optimization is not in my code)

besides the feeble attempt of binning the data, i am not sure how to proceed. Any input would be greatly appreciated :)

0

There are 0 best solutions below