How to exclude subjects with a specific proportion of NA in two columns

270 Views Asked by At

I have a dataset with columns for subject id, segment (1-42) and two columns with amplitudes (microvolt, t-transformed). The research aim is to compare Scoring/Quantification Methods of Electromyogram data scored manually vs. automatically. That's why there are two columns with the amplitudes. My goal for now is to exclude participants (column "vp") whose proportion of NA's is ≥ 20% in either of the two columns. Alternatively it would also be enough to print the subject id of the subjects who fulfil the criteria. I had the idea to use an if-else function, but can't wrap my head around how to define the criteria.

Here are the first rows of my dataset; there are 42 rows of each subject.

structure(list(vp = c("AD_001_B_NPU", "AD_001_B_NPU", "AD_001_B_NPU", "AD_001_B_NPU", "AD_001_B_NPU", "AD_001_B_NPU", "AD_001_B_NPU", "AD_001_B_NPU", "AD_001_B_NPU"), seg = c(1, 2, 3, 4, 5, 6, 7, 8,9), t_amp_manual = c(70.6,81.4,58.1, 78.1, 59.2, 55.1, 55.1, 62.2,59.7), t_amp_automatic = c(73.7, NA, 59.8, 82.9, 62.7, NA, 53.6, 65.0, 63.3), (row.names = c("vp", "seg", "t_amp_manual","t_amp_automatic"), class = "data.frame")
2

There are 2 best solutions below

2
On

Untested because I couldn't get your data working, but this should work:

library(dplyr)
npu_Kopie %>%
  ## assuming `vp` is the "subject" column??
  group_by(vp) %>%   
  filter(
    mean(is.na(t_amp_manual)) < 0.2 & mean(is.na(t_amp_automatic)) < 0.2
  )
0
On

It's not the cleanest code but it can probably solve your problem.

df <- data.frame(a = c(1,2,3,4,5,6), b = c(NA,2,3,NA,5,NA), c = c(1,2,3,4,5,6), d = c(1,NA,NA,NA,5,6), e = c(1,2,3,NA,NA,6), f = c(1,NA,NA,NA,5,6))

for(i in 1:length(df[1,])){

 if(sum(is.na(df[,i]))/length(df[,i])<= 0.2){


 } else {

df <-  df[,-i]

}

}