Identify NA's in sequence row-wise

1.1k Views Asked by At

I want to fill NA values in a sequence, which is row-wise, based on a condition. Please see example below.

ID | Observation 1 | Observation 2 | Observation 3 | Observation 4 | Observation 5
 A         NA              0               1             NA             NA

The condition is:

  • all NA values before !NA values in the sequence should be left as NA;
  • but all NAs after !NA values in the sequence should be tagged ("remove")

In the example above, NA value in Observation 1 should remain NA. However, the NA values in Observations 4 and 5 should be changed to "Remove".

1

There are 1 best solutions below

4
On BEST ANSWER

You can define the function:

replace.na <- function(r,val) {
  i <- is.na(r)
  j <- which(i)
  k <- which(!i)
  r[j[j > k[length(k)]]] <- val
  r
}

Then, assuming that you have a data.frame like so:

r <- data.frame(ID=c('A','B'),obs1=c(NA,1),obs2=c(0,NA),obs3=c(1,2),obs4=c(NA,3),obs5=c(NA,NA))
##  ID obs1 obs2 obs3 obs4 obs5
##1  A   NA    0    1   NA   NA
##2  B    1   NA    2    3   NA

We can apply the function over the rows for all numeric columns of r:

r[,-1] <- t(apply(r[,-1],1,replace.na,999))    
##  ID obs1 obs2 obs3 obs4 obs5
##1  A   NA    0    1  999  999
##2  B    1   NA    2    3  999

This treats r[,-1] as a matrix and the output of apply fills a matrix, which by default is filled by columns. Therefore, we have to transpose the resulting matrix before replacing the columns back into r.

Another way to call replace.na is:

r[,-1] <- do.call(rbind,lapply(data.frame(t(r[,-1])),replace.na,999))

Here, we transpose the numeric columns of r first and make that a data.frame. This makes each row of r a column in the list of columns that is the resulting data frame. Then use lapply over these columns to apply replace.na and rbind the results.


If you want to flag all NA's after the first non-NA, then the function replace.na should be:

replace.na <- function(r,val) {
  i <- is.na(r)
  j <- which(i)
  k <- which(!i)
  r[j[j > k[1]]] <- val
  r
}

Applying it to the data:

r[,-1] <- do.call(rbind,lapply(data.frame(t(r[,-1])),replace.na,999))
##  ID obs1 obs2 obs3 obs4 obs5
##1  A   NA    0    1  999  999
##2  B    1  999    2    3  999