Remove rows that are within 12 months of each other for each participant across 15 years

97 Views Asked by At

I am new to R.

I have a large dataset that goes over 15 years (2007-2022) (see below an example portion). Essentially, I want to create a 12-month rolling window and remove the rows within 12 months of the first collection date for each subject. For instance, subject 1002 has 3 collection time points: 1st Jan 2007, 1st June 2007 and 1st Jan 2008. Using the 12-month window criteria from the first day (1st Jan 2007), I want to remove rows that contain 1st June 2007 (less than 12 months since the first day) and keep the 1st Jan 2008 (12 months from the first day). This is also the same for 1004, where the rows with the date 1st Jan 2013 is within 12 months of the previous date 1st of Feb 2012.

df<- tribble(~Subject_ID,~Collection_ID,~Date,~Sample,~Test,~Status,
            1001,1119139,"2007-01-01","E","A","N",1001,1119139,"2007-01-01","E","A","N",1001,1119139,"2007-01-01","E","A","N",1001,1119139,"2007-01-01","E","A","N",1001,1119139,"2007-01-01","E","A","N",1002,1023347,"2007-01-01","E","A","N", 1002,1023347,"2007-01-01","E","A","N", 1002,1023347,"2007-01-01","E","A","N",1002,1023347,"2007-01-01","E","A","N",1002,1023347,"2007-01-01","E","A","N",1002,1023347,"2007-01-01","E","A","N",1002,1129086,"2007-06-01","E","A","N",1002,1129086,"2007-06-01","E","A","N",1002,1129086,"2007-06-01","E","A","N",1002,1129086,"2007-06-01","E","A","N",1002,1129086,"2007-06-01","E","A","N",1002,1105112,"2008-01-01","E","A","N",1002,1105112,"2008-01-01","E","A","N",1002,1105112,"2008-01-01","E","A","N",1002,1105112,"2008-01-01","E","A","N",1002,1105112,"2008-01-01","E","A","N",1003,1057321,"2010-03-01","E","A","N",1003,1057321,"2010-03-01","E","A","N",1003,1057321,"2010-03-01","E","A","N",1003,1057321,"2010-03-01","E","A","N",1003,1057321,"2010-03-01","E","A","N",1004,1022691,"2012-02-01","E","A","N",1004,1022691,"2012-02-01","E","A","N",1004,1022691,"2012-02-01","E","A","N",1004,1022691,"2012-02-01","E","A","N",1004,1022691,"2012-02-01","E","A","N",1004,1070101,"2013-01-01","E","A","N",1004,1070101,"2013-01-01","E","A","N",1004,1070101,"2013-01-01","E","A","N",1004,1070101,"2013-01-01","E","A","N",1004,1070101,"2013-01-01","E","A","N",1004,1070101,"2013-01-01","E","A","N",)

My desired output is the following:

enter image description here

Many thanks in advance!

1

There are 1 best solutions below

0
On

With package lubridate, you can remove rows that are not in the beginning of the year:

library(tidyverse)
library(lubridate)

df %>% 
  filter(ymd(Date) == floor_date(ymd(Date), "year"))

# A tibble: 22 x 6
   Subject_ID Collection_ID Date       Sample Test  Status
        <dbl>         <dbl> <chr>      <chr>  <chr> <chr> 
 1       1001       1119139 2007-01-01 E      A     N     
 2       1001       1119139 2007-01-01 E      A     N     
 3       1001       1119139 2007-01-01 E      A     N     
 4       1001       1119139 2007-01-01 E      A     N     
 5       1001       1119139 2007-01-01 E      A     N     
 6       1002       1023347 2007-01-01 E      A     N     
 7       1002       1023347 2007-01-01 E      A     N     
 8       1002       1023347 2007-01-01 E      A     N     
 9       1002       1023347 2007-01-01 E      A     N     
10       1002       1023347 2007-01-01 E      A     N     
11       1002       1023347 2007-01-01 E      A     N     
12       1002       1105112 2008-01-01 E      A     N     
13       1002       1105112 2008-01-01 E      A     N     
14       1002       1105112 2008-01-01 E      A     N     
15       1002       1105112 2008-01-01 E      A     N     
16       1002       1105112 2008-01-01 E      A     N     
17       1004       1070101 2013-01-01 E      A     N     
18       1004       1070101 2013-01-01 E      A     N     
19       1004       1070101 2013-01-01 E      A     N     
20       1004       1070101 2013-01-01 E      A     N     
21       1004       1070101 2013-01-01 E      A     N     
22       1004       1070101 2013-01-01 E      A     N