Hello everyon and sorry for the long post
i have an array of 3-dimensional coordinates (x,y,z) in a 2D array with size (13720,3). i would like to make a point density map of the coordinates so i can see which areas are there many observations, and where we never see anything. Note that there is no value associated with coordinates - but if it is easier to work with values associated with coordinates, i could make a grid spanning the entire volume and assign 1 to the observations and 0 to where there are no observations.
There is already as nice thread about this here: How to plot a 3D density map in python with matplotlib
but when i run the code below i run into problems with infinity and/or nan
import numpy as np
from os import listdir
from scipy import stats
from mayavi import mlab # will be used for 3d plot when gaussian_kde works
Files = listdir(corrPath)
numHeaders = int(2)
coords = []
for f in Files:
k = int(0)
if f[:3] == 'rew':
fid = open(corrPath+f,'r')
for line in fid:
k += 1
if k > numHeaders:
Info = line.split()
coords.append(Info[0:3])
fid.close()
coords = np.array(coords,dtype='float')
kde = gaussian_kde(coords) # very, very slow - all 8Ggyte of RAM is swallowed - maybe
# multiprocessing will speed it up though - see SO thread
# kde = gaussian_kde(coords.astype(int)) # throws singular matrix error
# according to np.linalg.cond the condition
# number is around 4.22
density = kde(coords)
which throws the warnings
path/python2.7/site-packages/scipy/stats/kde.py:231:
RuntimeWarning: overflow encountered in exp result = result + exp(-energy)
path/python2.7/site-packages/scipy/stats/kde.py:240:
RuntimeWarning: divide by zero encountered in true_divide
result = result / self._norm_factor
and the density is
in[16]: density
Out[16]: array([ inf, inf, inf])
i have looked at the documentation for histogramdd ( http://docs.scipy.org/doc/numpy/reference/generated/numpy.histogramdd.html ) to see if i could perhaps bin the data points and that way speed up the calculation and avoid the problems with inverting the matrix. but i cannot get it to work like at want i to. This is my code
numBins = 280
binCoords, edges = np.histogramdd(coords,bins=(numBins,numBins,numBins))
as far as i can see, it bins every column in my data set individually, so i do not think i can use that to speed up stat.gaussian_kde. I can see from the thread i have linked that multiprocessing could speed up the evaluation - that would be nice, but i am not quite sure how to implement it - i do not understand his last script entirely (just if you are wondering why his optimization is not in my code)
besides the feeble attempt of binning the data, i am not sure how to proceed. Any input would be greatly appreciated :)