I have a dataset in which each person has multiple visits and Event1 and Event2 were documented at each visit (1=yes, 0=no). Not everyone has Event1 or Event2 happened. How do I pick out people whose Event1 disappeared after or at the same time their Event2 occurred (Event2 only needs to occur once)?
ID <- c(1,1,1,1,1,2,2,2,3,3,4,4,4,4,5,5,5,6,6,6,6,6,6)
Visit <- c(1,2,3,4,5,1,2,3,1,2,1,2,3,4,1,2,3,1,2,3,4,5,6)
Event1 <- c(0,1,0,0,0,1,0,0,1,1,0,1,1,0,0,0,0,0,0,0,1,0,0)
Event2 <- c(0,1,1,1,1,0,1,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1)
df <- data.frame(ID, Visit, Event1, Event2)
In this case, ID=1,2,4 had Event1 going away after/at the same time Event2 happened. So I want to keep these 3 people and all their observations.
base R
We can use
avefor this grouping op.Most of the pain here is the inner logic (which is shared with the
dplyrsolution below), but one possibly-annoying nuance to keep in mind withaveis that the return value is always the same class as the first argument. That is, even ifFUNreturnslogical, since the first argument isinteger, the return value is cast/coerced to integer.Normally
avehas some form of "data" as its first argument, but since we need to reference two vectors (two columns in the frame), we need to instead use the row number so that we can reference more than one column inside theFUNction.dplyr
This perhaps reads a little more easily.
Note that the
.by=requiresdplyr_1.1.0or newer; if you have an older version, change to(same results)
Data