How do I delete rows with NAs and those that follow the NAs?

107 Views Asked by At

I have some data where I want to remove the NAs and the data that follows the NAs by the level of a factor.

Removing the NAs is easy:

df <- data.frame(a=c("A","A","A","B","B","B","C","C","C","D","D","D"), b=c(0,1,0,0,0,0,0,1,0,0,0,1) ,c=c(4,5,3,2,1,5,NA,5,1,6,NA,2))
df
newdf<-df[complete.cases(df),];newdf

The final result should remove all of the rows for C and the final two rows of D.

Hope you can help.

3

There are 3 best solutions below

3
On BEST ANSWER

A classic split-apply-combine in base R:

do.call(rbind,lapply(split(df, df$a),function(x)x[cumsum(is.na(x$c))<1,]))

Here it is again, but in several lines:

split_df <- split(df, df$a)
apply_df <- lapply(split_df, function(x)x[cumsum(is.na(x$c))<1,])
combine_df <- do.call(rbind, apply_df)

The result:

> do.call(rbind,lapply(split(df, df$a),function(x)x[cumsum(is.na(x$c))<1,]))
#    a b c
#A.1 A 0 4
#A.2 A 1 5
#A.3 A 0 3
#B.4 B 0 2
#B.5 B 0 1
#B.6 B 0 5
#D   D 0 6
0
On

We can try with data.table. Convert the 'data.frame' to 'data.table' (setDT(df)), grouped by 'a', get the cumulative sum of logical vector of NA elements in 'c' and check whether it is less than 1 to subset

library(data.table)
setDT(df)[,  .SD[cumsum(is.na(c))<1], by= a]

Or a faster option with .I to return the row index of the logical vector and subset the rows.

setDT(df)[df[, .I[cumsum(is.na(c)) < 1], by = a]$V1]
#   a b c
#1: A 0 4
#2: A 1 5
#3: A 0 3
#4: B 0 2
#5: B 0 1
#6: B 0 5
#7: D 0 6
0
On

A similar solution in dplyr would be

library(dplyr)
df %>% group_by(a) %>% filter(!is.na(cumsum(c)))

Output:

Source: local data frame [7 x 3]
Groups: a [3]

       a     b     c
  <fctr> <dbl> <dbl>
1      A     0     4
2      A     1     5
3      A     0     3
4      B     0     2
5      B     0     1
6      B     0     5
7      D     0     6

If we take the cumulative sum of variable C, any values after the first NA will be converted to NA. Performing this at the group level allows us to remove NA rows and get the desired output.