Moving entries within unique ID, using loop

275 Views Asked by At

I am preparing a dataset for a survival analysis and encountered this problem, using R. I was wondering if anyone "for loop" expert could help me solving this issue. Any suggestion would be greatly appreciated.

The dataset was long and has only three columns: ID, date, and a numerical variable. Sample data looks like this:

ID<-c(1,1,1,1,2,2,3)
date<-c("05/11/2012","01/30/2013",
"06/14/2015","07/22/2016","01/07/2011","02/02/2012","02/03/2017")
NV<-c(12,22,14,5,17,6,25)
dat<-as.data.frame(cbind(x,y,z));dat

What I need to have is:

  1. Create a additional column "event_date"
  2. Within each unique ID, move the "date" in the second row to the "event_date" in the previous row, third "date" to the second "event_date"...so on and so forth. If it is the last row for this ID, then print "stop".
  3. If certain ID only has one row, then also print "stop".

The desired outcome is like this:

  ID       date NV event_date
1  1 05/11/2012 12 01/30/2013
2  1 01/30/2013 22 06/14/2015
3  1 06/14/2015 14 07/22/2016
4  1 07/22/2016  5       stop
5  2 01/07/2011 17 02/02/2012
6  2 02/02/2012  6       stop
7  3 02/03/2017 25       stop

It seems to be a easy switch, but forgive my poor skill, things got even more confusing as I was coding by myself. Initially, I have tried conditional "for" loop, subset in loops, and even double loops. Nothing quite works. And I really like to learn more about loops.

Thanks to all who contributes any thoughts.

2

There are 2 best solutions below

1
On

This is easily done with dplyr.

By the way, your data.frame line should be...

dat <- data.frame(ID, date, NV, stringsAsFactors = FALSE)

as x, y, and z are not defined, and you probably don't want your date strings as factors.

library(dplyr)
dat <- dat %>% group_by(ID) %>% mutate(event_date=lead(date, default = "stop"))

dat
# A tibble: 7 x 4
# Groups:   ID [3]
     ID       date    NV event_date
  <chr>      <chr> <chr>      <chr>
1     1 05/11/2012    12 01/30/2013
2     1 01/30/2013    22 06/14/2015
3     1 06/14/2015    14 07/22/2016
4     1 07/22/2016     5       stop
5     2 01/07/2011    17 02/02/2012
6     2 02/02/2012     6       stop
7     3 02/03/2017    25       stop

Note that putting the word "stop" in the new column will force it to be formatted as character. If you want to format it as dates, you might be better off using NA.

2
On

Your data.frame is not well defined, it should be (note that there's no need for cbind).

dat <- data.frame(ID, date, NV)
dat

Now, with base R only.

res <- lapply(split(dat, dat$ID), function(x){
        x$event_date <- c(as.character(x$date)[-1], "stop")
        x
    })
res <- do.call(rbind, res)
row.names(res) <- NULL
res
  ID       date NV event_date
1  1 05/11/2012 12 01/30/2013
2  1 01/30/2013 22 06/14/2015
3  1 06/14/2015 14 07/22/2016
4  1 07/22/2016  5       stop
5  2 01/07/2011 17 02/02/2012
6  2 02/02/2012  6       stop
7  3 02/03/2017 25       stop