How do I delete rows with NAs and those that follow the NAs?

Question

How do I delete rows with NAs and those that follow the NAs?

107 Views Asked by adkane At 02 April 2025 at 05:54

I have some data where I want to remove the NAs and the data that follows the NAs by the level of a factor.

Removing the NAs is easy:

df <- data.frame(a=c("A","A","A","B","B","B","C","C","C","D","D","D"), b=c(0,1,0,0,0,0,0,1,0,0,0,1) ,c=c(4,5,3,2,1,5,NA,5,1,6,NA,2))
df
newdf<-df[complete.cases(df),];newdf

The final result should remove all of the rows for C and the final two rows of D.

Hope you can help.

Original Q&A

There are 3 best solutions below

akrun On 19 December 2016 at 12:57

We can try with data.table. Convert the 'data.frame' to 'data.table' (setDT(df)), grouped by 'a', get the cumulative sum of logical vector of NA elements in 'c' and check whether it is less than 1 to subset

library(data.table)
setDT(df)[,  .SD[cumsum(is.na(c))<1], by= a]

Or a faster option with .I to return the row index of the logical vector and subset the rows.

setDT(df)[df[, .I[cumsum(is.na(c)) < 1], by = a]$V1]
#   a b c
#1: A 0 4
#2: A 1 5
#3: A 0 3
#4: B 0 2
#5: B 0 1
#6: B 0 5
#7: D 0 6

Joe On 19 December 2016 at 13:24

A similar solution in dplyr would be

library(dplyr)
df %>% group_by(a) %>% filter(!is.na(cumsum(c)))

Output:

Source: local data frame [7 x 3]
Groups: a [3]

       a     b     c
  <fctr> <dbl> <dbl>
1      A     0     4
2      A     1     5
3      A     0     3
4      B     0     2
5      B     0     1
6      B     0     5
7      D     0     6

If we take the cumulative sum of variable C, any values after the first NA will be converted to NA. Performing this at the group level allows us to remove NA rows and get the desired output.

**plannapus** · Accepted Answer

A classic split-apply-combine in base R:

do.call(rbind,lapply(split(df, df$a),function(x)x[cumsum(is.na(x$c))<1,]))

Here it is again, but in several lines:

split_df <- split(df, df$a)
apply_df <- lapply(split_df, function(x)x[cumsum(is.na(x$c))<1,])
combine_df <- do.call(rbind, apply_df)

The result:

> do.call(rbind,lapply(split(df, df$a),function(x)x[cumsum(is.na(x$c))<1,]))
#    a b c
#A.1 A 0 4
#A.2 A 1 5
#A.3 A 0 3
#B.4 B 0 2
#B.5 B 0 1
#B.6 B 0 5
#D   D 0 6

How do I delete rows with NAs and those that follow the NAs?

There are 3 best solutions below

Related Questions in R

Related Questions in CONDITIONAL-STATEMENTS

Related Questions in NA

Trending Questions

Popular # Hahtags

Popular Questions