Using data.table v1.14.2 (R 4.2.1 - edited: also with v1.14.8 with R 4.2.3), I was able to use shift() to assign new columns in j by group after rows in i were subset. The same code is not working now using data.table v1.15.2 (R 4.3.3).
Here is some sample data
set.seed(1)
data <- data.table(iris)[Species %in% c(
'versicolor', 'virginica'
), .(Species, value = Petal.Width)][, .SD[1:8], by = Species]
data[, to.keep := runif(.N) > .3]
data # N:16, N[TRUE]:11
# Species value to.keep
# 1: versicolor 1.4 FALSE
# 2: versicolor 1.5 TRUE
# 3: versicolor 1.5 TRUE
# 4: versicolor 1.3 TRUE
# 5: versicolor 1.5 FALSE
# 6: versicolor 1.3 TRUE
# 7: versicolor 1.6 TRUE
# 8: versicolor 1.0 TRUE
# 9: virginica 2.5 TRUE
# 10: virginica 1.9 FALSE
# 11: virginica 2.1 FALSE
# 12: virginica 1.8 FALSE
# 13: virginica 2.2 TRUE
# 14: virginica 2.1 TRUE
# 15: virginica 1.7 TRUE
# 16: virginica 1.8 TRUE
Using data.table v1.14.2 (R 4.2.1), I am able to create lag columns by group considering only certain values in i:
mycols <- paste0('lag.', 1:3)
data[to.keep == TRUE, (mycols) := shift(value, n = 1:3, type = 'lag'), by = Species]
data
# Species value to.keep lag.1 lag.2 lag.3
# 1: versicolor 1.4 FALSE NA NA NA
# 2: versicolor 1.5 TRUE NA NA NA
# 3: versicolor 1.5 TRUE 1.5 NA NA
# 4: versicolor 1.3 TRUE 1.5 1.5 NA
# 5: versicolor 1.5 FALSE NA NA NA
# 6: versicolor 1.3 TRUE 1.3 1.5 1.5
# 7: versicolor 1.6 TRUE 1.3 1.3 1.5
# 8: versicolor 1.0 TRUE 1.6 1.3 1.3
# 9: virginica 2.5 TRUE NA NA NA
# 10: virginica 1.9 FALSE NA NA NA
# 11: virginica 2.1 FALSE NA NA NA
# 12: virginica 1.8 FALSE NA NA NA
# 13: virginica 2.2 TRUE 2.5 NA NA
# 14: virginica 2.1 TRUE 2.2 2.5 NA
# 15: virginica 1.7 TRUE 2.1 2.2 2.5
# 16: virginica 1.8 TRUE 1.7 2.1 2.2
However, trying the same code in data.table v1.15.2 (R 4.3.3) results in the following error:
Error in `[.data.table`(data, to.keep == TRUE, `:=`((mycols), shift(value, :
Supplied 16 items to be assigned to 11 items of column 'lag.1'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code.
Of course there are alternatives to achieve my goal. For instance, I can make a redundant row subsetting in j:
data[to.keep == TRUE, (mycols) := shift(value[to.keep == TRUE], n = 1:3, type = 'lag'), by = Species]
However, I understand that the original code should work as well. Do I miss anything?
I have checked for changes in the docs (man/shift.Rd) or function definition (R/shift.R) but have not found any relevant change (for instance, I have tried using shift(..., fill='NA') with same results). I have not found any related question in stackoverflow neither.
Edit: fix is now on CRAN (1.15.4+)
Unfortunately, you were affected by a regression in versions 1.15.0 and 1.15.2: https://github.com/Rdatatable/data.table/issues/5962.
That issue is now fixed on CRAN as of v1.15.4.
You didn't find any change in shift.R since the relevant change is in
gshift(), the group-optimized lag computation that was added for 1.15.0. This is only used in[queries.