Find submatrices of matrix containing nan values such that largest amount of known values is covered by submatrices

92 Views Asked by At

Given a matrix of arbitrary dimensions where under a certain antidiagonal all values are unknown, NaN, I want to extract submatrices without NaN values which are as large as possible such that as many elements of the original matrix are contained in the submatrices. Each submatrix must be at least R by R, where R can be specified by the user. The submatrices may overlap each other. I'm able to find the anti-diagonal under which all elements are NaN, but I don't really know how to divide the non-NaN part of the matrix in submatrices which try to cover as much of the known values as possible.

The following image illustrates how the numeric part of the matrix might be partly divided in submatrices for R = 3: Possible submatrix division

I have the following code:

mask = ~isnan(A);
nrKnownElsRow = sum(mask,2);
dims = [[1:length(nrKnownElsRow)]' nrKnownElsRow];
dims(dims(:,2)<R,:) = [];
dims(dims(:,1)<R,:) = [];
idx = find(dims(:,2) == max(dims(:,2)));
dims(idx(1:end-1),:) = [];
dims = [dims(:,1) zeros(length(dims),1) dims(:,2)];

for k = 1:length(dims)-1
    if dims(k,3) - dims(k+1,3) >= R
        dims(k,2) = dims(k+1,3) + 1;
    else
        dims(k,2) = dims(k,3) - R + 1;
    end
end
dims(end,2) = 1;

The dims variable contains in each row for the corresponding submatrix the row up to which the subtensor goes, then the outer columns that make up the submatrix. This uses the most information possible in the matrix, but the submatrices are not as large as possible.

1

There are 1 best solutions below

0
John Bofarull Guix On

The following solution is not perfect but it may help you get started.

This script catches all submatrixes with 1 or both sides > r (the threshold you defined as R).

close all;clear all;clc

N1=10;N2=10  % A size
% avoid N1=1 N2=1

r=3; % threshold

generating A

A=reshape(randi([1 10],1,N1*N2),N1,N2); % generating A
n0=randi([1 5],1,1)            % define amount NaN to scatter in A
% n0=1 % test
n1=randi([2 N1-1],1,n0);
n2=randi([2 N2-1],1,n0);
A(sub2ind([N1 N2],n1,n2))=NaN

find NaNs coordinates

A2=isnan(A);
[nan_row,nan_col]=find(A2>0)
nan_row=sort(nan_row);nan_col=sort(nan_col);

% calculate coordinates combinations, no repetition
L1=[1 1;nan_row,nan_col;N1 N2];
L2x=nchoosek(L1(:,2),2) % 1D coord combinations
L2y=nchoosek(L1(:,1),2)

Now L2x and L2y contain all single-coordinate combinations

combining L2x and L2y into single variable L3

[cx,cy]=meshgrid([1:1:size(L2x,1)],[1:1:size(L2y,1)])

L3=[L2x(cx(:),:) L2y(cy(:),:)]      

trim borders of all submatrixes
for k1=1:1:size(L3,1)                  
    if L3(k1,1)==1 && L3(k1,2)<N1
        L3(k1,2)=L3(k1,2)-1;
    end
    if L3(k1,1)>1 && L3(k1,2)==N1
        L3(k1,1)=L3(k1,1)+1;
    end
    if L3(k1,1)>1 && L3(k1,2)<N1
        L3(k1,1)=L3(k1,1)-1;
        L3(k1,2)=L3(k1,2)+1;
    end
end
for k1=1:1:size(L3,1)                 
    if L3(k1,3)==1 && L3(k1,4)<N2
        L3(k1,4)=L3(k1,4)-1;
    end
    if L3(k1,3)>1 && L3(k1,4)==N2
        L3(k1,3)=L3(k1,3)+1;
    end
    if L3(k1,3)>1 && L3(k1,4)<N2
        L3(k1,3)=L3(k1,3)-1;
        L3(k1,4)=L3(k1,4)+1;
    end
end
% L3

remove any submatrix containing NaNs

cnan=zeros(1,size(L3,1))
for k3=1:1:numel(cnan)
    A4=A(L3(k3,3):L3(k3,4),L3(k3,1):L3(k3,2))
    if nonzeros(isnan(A4));
        cnan(k3)=1   % contains NaN(s)
    end
end
L3(cnan>0,:)=[] 

calculating all submatrix combinations, pairs, no repetition

nL3=nchoosek(size(L3,1),2)  % how many pairs
cnL3=nchoosek([1:size(L3,1)],2)

attaching marker to each cnL3 line telling whether a1(a2), or a2(a1), or a1==a2.

cnL3 format : [ ... A51_contains_A52 A52_contains_A51]

cnL3=[cnL3 zeros(size(cnL3,1),2)]

who contains who

for k3=1:1:nL3
    d1=L3(cnL3(k3,1),:)
    d2=L3(cnL3(k3,2),:)
    
    if (d1(1)<d2(1) && d1(2)>d2(2) && d1(3)<d2(3) && d1(4)>d2(4)) || ...  % A51 contains A52
       (d1(1)<=d2(1) && d1(2)>=d2(2) && d1(3)<=d2(3) && d1(4)>d2(4)) || ...  % 3 side    
       (d1(1)<=d2(1) && d1(2)>=d2(2) && d1(3)<d2(3) && d1(4)>=d2(4)) || ...
       (d1(1)<=d2(1) && d1(2)>d2(2) && d1(3)<=d2(3) && d1(4)>=d2(4)) || ...
       (d1(1)<d2(1) && d1(2)>=d2(2) && d1(3)<=d2(3) && d1(4)>=d2(4)) || ...
       (d1(1)<=d2(1) && d1(2)>=d2(2) && d1(3)<d2(3) && d1(4)>d2(4)) || ...    % 2 side
       (d1(1)<=d2(1) && d1(2)>d2(2) && d1(3)<d2(3) && d1(4)>=d2(4)) || ...
       (d1(1)<d2(1) && d1(2)>d2(2) && d1(3)<=d2(3) && d1(4)>=d2(4)) || ...
       (d1(1)<d2(1) && d1(2)>=d2(2) && d1(3)<=d2(3) && d1(4)>d2(4)) || ...
       (d1(1)<=d2(1) && d1(2)>d2(2) && d1(3)<d2(3) && d1(4)>d2(4)) || ...   % 1 side
       (d1(1)<d2(1) && d1(2)>d2(2) && d1(3)<d2(3) && d1(4)>=d2(4)) || ...
       (d1(1)<d2(1) && d1(2)>d2(2) && d1(3)<=d2(3) && d1(4)>d2(4)) || ...
       (d1(1)<d2(1) && d1(2)>=d2(2) && d1(3)<d2(3) && d1(4)>d2(4))
          cnL3(k3,3)=1; 
    end
  
    if (d2(1)<d2(1) && d2(2)>d2(2) && d2(3)<d2(3) && d2(4)>d2(4)) || ...   % A52 contains A51
       (d2(1)<=d2(1) && d2(2)>=d2(2) && d2(3)<=d2(3) && d2(4)>d2(4)) || ...    
       (d2(1)<=d2(1) && d2(2)>=d2(2) && d2(3)<d2(3) && d2(4)>=d2(4)) || ...
       (d2(1)<=d2(1) && d2(2)>d2(2) && d2(3)<=d2(3) && d2(4)>=d2(4)) || ...
       (d2(1)<d2(1) && d2(2)>=d2(2) && d2(3)<=d2(3) && d2(4)>=d2(4)) || ...
       (d2(1)<=d2(1) && d2(2)>=d2(2) && d2(3)<d2(3) && d2(4)>d2(4)) || ...    
       (d2(1)<=d2(1) && d2(2)>d2(2) && d2(3)<d2(3) && d2(4)>=d2(4)) || ...
       (d2(1)<d2(1) && d2(2)>d2(2) && d2(3)<=d2(3) && d2(4)>=d2(4)) || ...
       (d2(1)<d2(1) && d2(2)>=d2(2) && d2(3)<=d2(3) && d2(4)>d2(4)) || ...
       (d2(1)<=d2(1) && d2(2)>d2(2) && d2(3)<d2(3) && d2(4)>d2(4)) || ...   
       (d2(1)<d2(1) && d2(2)>d2(2) && d2(3)<d2(3) && d2(4)>=d2(4)) || ...
       (d2(1)<d2(1) && d2(2)>d2(2) && d2(3)<=d2(3) && d2(4)>d2(4)) || ...
       (d2(1)<d2(1) && d2(2)>=d2(2) && d2(3)<d2(3) && d2(4)>d2(4))
          cnL3(k3,4)=1; 
    end
    
    % A51=A(d1(3):d1(4),d1(1):d1(2)) % checks
    % A52=A(d2(3):d2(4),d2(1):d2(2))
  
end

pairs with contained submatrixes

[drow,dcol]=find(cnL3(:,3)==1 | cnL3(:,4)==1)

removing contained submatrixes from submatrix list L3

cnt=[]
for k3=1:1:numel(drow)
    L5=cnL3(drow(k3),:);
    if L5(3)==1 cnt=[cnt L5(2)]; end
    if L5(4)==1 cnt=[cnt L5(1)]; end
end
cnt=unique(cnt)
L4=L3
L4(cnt,:)=[]
L4

X : cell containing all complying submatrices

X={}    
for k1=1:1:size(L4,1)
    X=[X A(L4(k1,3):L4(k1,4),L4(k1,1):L4(k1,2))];
end
X

reading 1 element of cell X

X{1}

how to do a cell all-out

X{:}

Just typing X MATLAB returns all sizes of elements inside cell X

just example 
X 
=
1×4 cell array
Columns 1 through 2
{10×5 double}    {7×10 double}
Columns 3 through 4
{2×10 double}    {10×4 double}

now the threshold r you required

XR=X;
L5=diff(L4,1,2)+1  % calculating sizes
L5(:,[2:2:end])=[]
[rrow,rcol]=find(L5<r)
XR(rrow)=[]
XR{:}

This works for the few cases I checked.

Note I generate test matrixes with no NaNs along perimeter.

Stijn, if this solution helps would you please be so kind to consider green-ticking it as valid answer?

thanks for reading