Replace values with NaN or Inf when certain conditions are met

370 Views Asked by At

I created the following three dimensional mockup matrix:

mockup(:,:,1) = ...
    [100, 100, 100; ...
    103, 95, 100; ...
    101, 85, 100; ...
    96, 90, 102; ...
    91, 89, 99; ...
    97, 91, 97; ...
    105, 83, 100];

mockup(:,:,2) = ...
    [50, NaN, NaN; ...
    47, NaN, 40; ...
    45, 60, 45; ...
    47, 65, 45; ...
    51, 70, 45; ...
    54, 65, 50; ...
    62, 80, 55];

I also defined percentTickerAvailable = 0.5.

As a result, The columns represent equity prices from three different assets. For futher processing I need to manipulate the NaN values in the following way.

  1. If the percentage of NaNs in any given ROW is greater than 1 - percentTickerAvailable, replace all values in these particular rows with NaNs. That is, if not enough assets have prices in that particular row, ignore the row completely.
  2. If the percentage of NaNs in any given ROW is less or equal to 1 - percentTickerAvailable, replace the respective NaNs with -inf.

To be clear, "percentage of NaNs in any given ROW" is calculated as follows: Number of NaNs in any given ROW divided by number of columns.

The adjusted mockup matrix should look like this:

mockupAdj(:,:,1) = ...
    [100, 100, 100; ...
    103, 95, 100; ...
    101, 85, 100; ...
    96, 90, 102; ...
    91, 89, 99; ...
    97, 91, 97; ...
    105, 83, 100];

mockupAdj(:,:,2) = ...
    [NaN, NaN, NaN; ...
    47, -inf, 40; ...
    45, 60, 45; ...
    47, 65, 45; ...
    51, 70, 45; ...
    54, 65, 50; ...
    62, 80, 55];

So far, I did the following:

function vout = ranking(vin, percentTickerAvailable)

percentNonNaN = 1 - sum(isnan(vin), 2) / size(vin, 2);
NaNIdx = percentNonNaN < percentTickerAvailable;
infIdx = percentNonNaN > percentTickerAvailable & ...
    percentNonNaN < 1;
[~, ~, numDimVin] = size(vin);

for i = 1 : numDimVin
    vin(NaNIdx(:,:,i) == 1, :, i) = NaN;
end

about = vin;

end % EoF

By calling mockupAdj = ranking(mockup, 0.5) this already transforms the first row in mockup(1,:,2)correctly to {'NaN', 'NaN', 'NaN'}. However, I am struggling with the second point. With infIdx I already successfully identified the rows that corresponds to the second condition. But I don't know how to correctly use that information in order to replace the single NaN in mockup(2,2,2) with -inf.

Any hint is highly appreciated.

3

There are 3 best solutions below

0
On BEST ANSWER

1)

The percentage of NaN in any given row should be smaller than 1

... Are you talking about ratio? In which case this is a useless check, as it will always be the case. Or talking about percentages? In which case your code doesn't do what you describe. My guess is ratio.

2) Based on my guess, I have a follow up question: following your description, shouldn't mockup(2,2,2) stay NaN? There is 33% (<50%) of NaN in that row, so it does not fulfill your condition 2.

3) Based on the answers I deemed logical, I would have changed percentNaN = sum(isnan(vin), 2) / size(vin, 2); for readability, and NaNIdx = percentNaN > percentTickerAvailable; accordingly. Now just add one line in front of your loop:

vin(isnan(vin)) = -inf;

Why? Because like this you replace all the NaNs by -inf. Later on, the ones that respect condition 1 will be overwritten to NaN again, by the loop. You don't need the InfIdx.

4) Be aware that your function cannot return vout as of now. Just let it return vin, and you'll be fine.

0
On

This is a good example of something that can be solved using vectorization. I am providing two versions of the code, one that uses the modern syntax (including implicit expansion) and one for older version of MATLAB.

Several things to note:

  • In the NaN substitution stage, I'm using a "trick" where 0/0 is evaluated to NaN.
  • In the Inf substitution stage, I'm using logical masking/indexing to access the correct elements in vin.

R2016b and newer:

function vin = ranking (vin, percentTickerAvailable)
  % Find percentage of NaNs on each line:
  pNaN = mean(isnan(vin), 2, 'double');
  % Fills rows with NaNs:
  vin = vin + 0 ./ (1 - ( pNaN >= percentTickerAvailable));
  % Replace the rest with -Inf
  vin(isnan(vin) & pNaN < percentTickerAvailable) = -Inf;
end 

Prior to R2016b:

function vin = rankingOld (vin, percentTickerAvailable)
  % Find percentage of NaNs on each line:
  pNaN = mean(isnan(vin), 2, 'double');
  % Fills rows with NaNs:
  vin = bsxfun(@plus, vin, 0 ./ (1 - ( pNaN >= percentTickerAvailable)));
  % Replace the rest with -Inf
  vin(bsxfun(@and, isnan(vin), pNaN < percentTickerAvailable)) = -Inf;
end
0
On

You can also use logical indexing to achieve this task:

x(:,:,1) = ...
    [100, 100, 100; ...
    103, 95, 100; ...
    101, 85, 100; ...
    96, 90, 102; ...
    91, 89, 99; ...
    97, 91, 97; ...
    105, 83, 100];

x(:,:,2) = ...
    [50, NaN, NaN; ...
    47, NaN, 40; ...
    45, 60, 45; ...
    47, 65, 45; ...
    51, 70, 45; ...
    54, 65, 50; ...
    62, 80, 55];

    % We fix the threshold
    tres = 0.5; %fix the threshold.

    % We check if a value = NaN or not.
    in  = isnan(x);
    % Which line have more than 50% of NaN ?.
    ind = (sum(in,2)./(size(x,2)))>0.5
    % We generate an index
    [x1,~,x3] = ind2sub(size(ind),ind);
    % We set the NaN index to 0 if the line contains less than 50 % of NaN.
    in(x1,:,x3) = 0;

    % We calculate the new values.
    x(in) = -inf;
    x(x1,:,x3) = NaN;