I am trying to use R (tidyverse) to perform something like the following:
Suppose I have a data set with a subject ID, Visit Code, (say, 1 to 8 visits), Date of visit, demographics (age, sex, etc.), and two test results (test A and test B).
Test A was administered since the beginning of the study but not necessary at every visit. Test B began later (most commonly at Visit 5, but for some people at other Visits).
I want to make a cross-sectional dataset corresponding to the first time everyone performed test B (for most people that with be at Visit 5, but for others it will be another visit). I also want everyone to have a test A score that is at least with 1 year (+/-) of when test B was done (many people will have a test A score at the same time as test B, but some people don't have a test A score for the the visit with their first test B score say I then want to take the nearest test score from another visit only if it was within a year of the test B score).
I see people publishing work using this approach, but can you help me figure out how to code this to get this type of dataset from a master dataset?
To clarify the question, I put together a simple example (along with what the desired output I am hoping to get should be):
mydata <- data.frame(Id=c(1,1,1,1,1,1,1,1, 2,2,2,2,2,2,2,2, 3,3,3,3,3,3,3,3),
VISIT=c(1,2,3,4,5,6,7,8, 1,2,3,4,5,6,7,8, 1,2,3,4,5,6,7,8),
Time=c(0,1.1,1.9,3,4,5.1,6.1,6.9, 0,.9,2.1,3.1,4.1,5,6.1,7.2, 0,1,2.1,3.2,3.9,5.1,6,7.1),
Score_A=c(10,9,9,8,7,10,10,8, 5,9,4,3,NA,13,14,18, 9,9,10,11,NA,14,12,13),
Score_B=c(NA,NA,NA,NA,100,NA,90,NA, NA,NA,NA,NA,80,NA,99,NA, NA,NA,NA,NA,75,NA,97,NA) )
mydata
desired_output <- data.frame(Id=c(1,2,3), Score_A=c(7,13,11), Score_B=c(100,80,75))
I tried the following, but it's not accounting for the +/- 1 year so Person 2 has an NA for the Score_A:
Q <- mydata %>%
group_by(Id) %>%
arrange(Time, .by_group = T) %>%
filter(!is.na(Score_B)) %>%
slice(1)
Thanks!
Here is a
data.table
approach using a rolling join. It can be done using dplyr-joins (take a look at the helpfile onjoin_by()
, but I feel more familiar withdata.table
.update
output"
old answer