Time Lag based on another variable

338 Views Asked by At

Given:

test <- data.frame(Participant= c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3),
                   Day = c(0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9),
                   Value= c(1:30))

I want to arrive at:

test <- data.frame(Participant= c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3),
                   Day = c(0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9),
                   Value= c(1:30),
                   LaggedValue= c("NA", 1,2,3,4,5,6,7,8,9, "NA", 11,12,13,14,15,16,17,18,19, "NA", 21,22,23,24,25,26,27,28,29))

I have tried the following which allows me to time lag the variable but does so through the entire column. I'd like to time lag based on the ParticipantID or Day variable such that the time lag returns an "NA" when it encounters a new participant number or Day=0:

test$LaggedValue <- c(NA, test$Value[seq_along(test$Value) -1])

I'm not sure how I can add an "if" statement or base it on the Participant/Day variable. Would a nest() function possibly work here?

3

There are 3 best solutions below

1
On

To split a group variable, dplyr library (or the by command) are what you need, something like the following (I don't have access to an R interpreter right now):

require(dplyr)
test %>%
    group_by(Participant) %>%
    do(LaggedValue = lag(Value)) %>%
    ungroup()

This paradigm is the very-well-known split-apply-combine. Don't try to hack it up with if statements.

EDIT: or data.table package, as per Gary's answer

1
On

Using the data.table package you can do this very quickly using the special .I built-in variable:

library(data.table)
test <- data.frame(Participant= c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3),
                   Day = c(0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9),
                   Value= c(1:30))

# convert dataframe to data.table
test_dt <- as.data.table(test)

# Now insert your lagged value and NAs - if new participant
test_dt[, LaggedValue := c(NA, .I[-1] - 1), by = Participant]

# And just in case you misse da day 0
test_dt[Day == 0, LaggedValue := NA]

# Or in a single step based on @thelatemail's comment below
test_dt[, LaggedValue := shift(Value), by=Participant]

Which gives the answer:

test_dt

    Participant Day Value LaggedValue
 1:           1   0     1          NA
 2:           1   1     2           1
 3:           1   2     3           2
 4:           1   3     4           3
 5:           1   4     5           4
 6:           1   5     6           5
 7:           1   6     7           6
 8:           1   7     8           7
 9:           1   8     9           8
10:           1   9    10           9
11:           2   0    11          NA
12:           2   1    12          11
13:           2   2    13          12
14:           2   3    14          13
15:           2   4    15          14
16:           2   5    16          15
17:           2   6    17          16
18:           2   7    18          17
19:           2   8    19          18
20:           2   9    20          19
21:           3   0    21          NA
22:           3   1    22          21
23:           3   2    23          22
24:           3   3    24          23
25:           3   4    25          24
26:           3   5    26          25
27:           3   6    27          26
28:           3   7    28          27
29:           3   8    29          28
30:           3   9    30          29
    Participant Day Value LaggedValue
2
On

Lets break it down to what your requirement is-

1) You need a Lagged column so for that don't use the built-in lag() in R as this gives conflicting result on several occasion. I would suggest to use Lag() (starts with capital L) from HmiSc Package to do so.

2) Second part of the question says lagging should be done as per Participants column. This is a type of grouping operation, so data table does this in a beautiful way. The last line of the code shows the by within bracket as a mean to do grouping. And the best part is the result of this operation itself a data table, so no need go for any conversion into data table or data frame, which you may need to do if you are using dplyr

So the code can be-

library(data.table)
library(Hmisc)

test <- data.table(Participant= c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3),Day = c(0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9),Value= c(1:30))

test[,LaggedValue:=Lag(Value),by='Participant']