data.table shift() in v1.15.2 not working when rows are subset in i by column - `DT[i == TRUE, (cols) := shift(), by = col]`

Question

data.table shift() in v1.15.2 not working when rows are subset in i by column - `DT[i == TRUE, (cols) := shift(), by = col]`

108 Views Asked by aattp At 07 March 2024 at 13:36

Using data.table v1.14.2 (R 4.2.1 - edited: also with v1.14.8 with R 4.2.3), I was able to use shift() to assign new columns in j by group after rows in i were subset. The same code is not working now using data.table v1.15.2 (R 4.3.3).

Here is some sample data

set.seed(1)
data <- data.table(iris)[Species %in% c(
    'versicolor', 'virginica'
), .(Species, value = Petal.Width)][, .SD[1:8], by = Species]
data[, to.keep := runif(.N) > .3]
data # N:16, N[TRUE]:11

#        Species value to.keep
#  1: versicolor   1.4   FALSE
#  2: versicolor   1.5    TRUE
#  3: versicolor   1.5    TRUE
#  4: versicolor   1.3    TRUE
#  5: versicolor   1.5   FALSE
#  6: versicolor   1.3    TRUE
#  7: versicolor   1.6    TRUE
#  8: versicolor   1.0    TRUE
#  9:  virginica   2.5    TRUE
# 10:  virginica   1.9   FALSE
# 11:  virginica   2.1   FALSE
# 12:  virginica   1.8   FALSE
# 13:  virginica   2.2    TRUE
# 14:  virginica   2.1    TRUE
# 15:  virginica   1.7    TRUE
# 16:  virginica   1.8    TRUE

Using data.table v1.14.2 (R 4.2.1), I am able to create lag columns by group considering only certain values in i:

mycols <- paste0('lag.', 1:3)
data[to.keep == TRUE, (mycols) := shift(value, n = 1:3, type = 'lag'), by = Species]
data
#        Species value to.keep lag.1 lag.2 lag.3
#  1: versicolor   1.4   FALSE    NA    NA    NA
#  2: versicolor   1.5    TRUE    NA    NA    NA
#  3: versicolor   1.5    TRUE   1.5    NA    NA
#  4: versicolor   1.3    TRUE   1.5   1.5    NA
#  5: versicolor   1.5   FALSE    NA    NA    NA
#  6: versicolor   1.3    TRUE   1.3   1.5   1.5
#  7: versicolor   1.6    TRUE   1.3   1.3   1.5
#  8: versicolor   1.0    TRUE   1.6   1.3   1.3
#  9:  virginica   2.5    TRUE    NA    NA    NA
# 10:  virginica   1.9   FALSE    NA    NA    NA
# 11:  virginica   2.1   FALSE    NA    NA    NA
# 12:  virginica   1.8   FALSE    NA    NA    NA
# 13:  virginica   2.2    TRUE   2.5    NA    NA
# 14:  virginica   2.1    TRUE   2.2   2.5    NA
# 15:  virginica   1.7    TRUE   2.1   2.2   2.5
# 16:  virginica   1.8    TRUE   1.7   2.1   2.2

However, trying the same code in data.table v1.15.2 (R 4.3.3) results in the following error:

Error in `[.data.table`(data, to.keep == TRUE, `:=`((mycols), shift(value,  : 
  Supplied 16 items to be assigned to 11 items of column 'lag.1'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code.

Of course there are alternatives to achieve my goal. For instance, I can make a redundant row subsetting in j:

data[to.keep == TRUE, (mycols) := shift(value[to.keep == TRUE], n = 1:3, type = 'lag'), by = Species]

However, I understand that the original code should work as well. Do I miss anything?

I have checked for changes in the docs (man/shift.Rd) or function definition (R/shift.R) but have not found any relevant change (for instance, I have tried using shift(..., fill='NA') with same results). I have not found any related question in stackoverflow neither.

Original Q&A

There are 1 best solutions below

**MichaelChirico** · Accepted Answer · 2024-03-12T18:05:16.210000

Edit: fix is now on CRAN (1.15.4+)

Unfortunately, you were affected by a regression in versions 1.15.0 and 1.15.2: https://github.com/Rdatatable/data.table/issues/5962.

That issue is now fixed on CRAN as of v1.15.4.

You didn't find any change in shift.R since the relevant change is in gshift(), the group-optimized lag computation that was added for 1.15.0. This is only used in [ queries.

data.table shift() in v1.15.2 not working when rows are subset in i by column - `DT[i == TRUE, (cols) := shift(), by = col]`

There are 1 best solutions below

Edit: fix is now on CRAN (1.15.4+)

Related Questions in R

Related Questions in DATA.TABLE

Trending Questions

Popular # Hahtags

Popular Questions