I have a matrix whose first column is X, second is Y and third is Z (a point cloud from Earth). Between them are outliers i.e. points that are very downward or very outside(because of systematic errors). I create a distance matrix and calculate distance of every point to all of the other points using below code:
xl = selected(:,1);
yl = selected(:,2);
zl = selected(:,3);
distanceMatrix = zeros(size(selected,1));
x = [xl(:)'; yl(:)'; zl(:)'];
IP = x' * x;
distanceMatrix = sqrt(bsxfun(@plus, diag(IP), diag(IP)') - 2 * IP);
selectedl
is my matrix. And calculate neighbors of each point and say: points that have 1 or 2 neighbors only are outliers.
But: because of my matrix is too large (considering size of matrix), my laptop cannot process(out of memory: 4G!)
Is there a method, function or code that calculate outliers automatically without calculate distance matrix?
Your code could be made more efficient. First, note that your
x
is simplyselected'
. Second, all your code could be replaced by this:(see documentation of
pdist
andsquareform
). In addition to making the code much simpler, this may help in reducing memory usage.If memory is still an issue, you may have to work in chunks, computing the distance from points in current chunk to all points. You can use
pdist2
(a generalized version ofpdist
that allows two different inputs and does not requiresquareform
):