I am preparing a dataset for a survival analysis and encountered this problem, using R. I was wondering if anyone "for loop" expert could help me solving this issue. Any suggestion would be greatly appreciated.
The dataset was long and has only three columns: ID, date, and a numerical variable. Sample data looks like this:
ID<-c(1,1,1,1,2,2,3)
date<-c("05/11/2012","01/30/2013",
"06/14/2015","07/22/2016","01/07/2011","02/02/2012","02/03/2017")
NV<-c(12,22,14,5,17,6,25)
dat<-as.data.frame(cbind(x,y,z));dat
What I need to have is:
- Create a additional column "event_date"
- Within each unique ID, move the "date" in the second row to the "event_date" in the previous row, third "date" to the second "event_date"...so on and so forth. If it is the last row for this ID, then print "stop".
- If certain ID only has one row, then also print "stop".
The desired outcome is like this:
ID date NV event_date
1 1 05/11/2012 12 01/30/2013
2 1 01/30/2013 22 06/14/2015
3 1 06/14/2015 14 07/22/2016
4 1 07/22/2016 5 stop
5 2 01/07/2011 17 02/02/2012
6 2 02/02/2012 6 stop
7 3 02/03/2017 25 stop
It seems to be a easy switch, but forgive my poor skill, things got even more confusing as I was coding by myself. Initially, I have tried conditional "for" loop, subset in loops, and even double loops. Nothing quite works. And I really like to learn more about loops.
Thanks to all who contributes any thoughts.
This is easily done with
dplyr
.By the way, your data.frame line should be...
as x, y, and z are not defined, and you probably don't want your date strings as factors.
Note that putting the word "stop" in the new column will force it to be formatted as character. If you want to format it as dates, you might be better off using
NA
.