How to remove outliers?

1.7k Views Asked by At

I have a matrix whose first column is X, second is Y and third is Z (a point cloud from Earth). Between them are outliers i.e. points that are very downward or very outside(because of systematic errors). I create a distance matrix and calculate distance of every point to all of the other points using below code:

xl = selected(:,1);
yl = selected(:,2);
zl = selected(:,3);
distanceMatrix = zeros(size(selected,1));
x = [xl(:)'; yl(:)'; zl(:)'];
IP = x' * x;
distanceMatrix = sqrt(bsxfun(@plus, diag(IP), diag(IP)') - 2 * IP);

selectedl is my matrix. And calculate neighbors of each point and say: points that have 1 or 2 neighbors only are outliers. But: because of my matrix is too large (considering size of matrix), my laptop cannot process(out of memory: 4G!)

Is there a method, function or code that calculate outliers automatically without calculate distance matrix?

1

There are 1 best solutions below

0
On BEST ANSWER

Your code could be made more efficient. First, note that your x is simply selected'. Second, all your code could be replaced by this:

distanceMatrix = squareform(pdist(selected));

(see documentation of pdist and squareform). In addition to making the code much simpler, this may help in reducing memory usage.

If memory is still an issue, you may have to work in chunks, computing the distance from points in current chunk to all points. You can use pdist2 (a generalized version of pdist that allows two different inputs and does not require squareform):

chunkSize = 100; %// use largest value your memory permits; here it is
%// assumed to divide size(selected,1)
for ii = chunkSize:chunkSize:size(selected,1)
    ind = ii + (-chunkSize+1:0); %// indices of points in current chunk
    distanceMatrix = pdist2(selected,selected(ind,:)); %// distance from points
    %// in current chunk to all points

    %// Decide which points of the current chunk are outliers, based on
    %// computed distanceMatrix
end