use column-name range in data.table like in dplyrs select

615 Views Asked by At

I would like to select multiple columns from a data.table (with 1200 column-names) specifying a range by column-name, like one can do with dplyr, for example:

library(data.table)
library(dplyr)
dt <- data.table(w = sample(100, 50),
       x = sample(100, 50),
       y = sample(100, 50),
       z = sample(100, 50))

select(dt, w:y)

Currently I am using the following work-around:

cols_to_select <- names(select(dt, w:y))
dt[ ,cols_to_select, with = FALSE]

I think the other option of using the col-numbers (for example dt[ , 1:3, with = FALSE] could lead to nasty bugs. Another option to select the names would be:

dt[ , .SD, .SDcols = cols_to_select]

It would be super-awesome if there was something like the following:

dt[ , .(w:y)]

Is there a better way to do this? And if not why? If this question is better placed at the github-issues of data.table let me know

1

There are 1 best solutions below

2
On BEST ANSWER

What I asked for is possible in the development-version of data.table (1.9.5) as can be seen in the new feature number 17. quoting:

  1. .SDcols and with=FALSE understand colA:colB form now. That is, DT[, lapply(.SD, sum), by=V1, .SDcols=V4:V6] and DT[, V5:V7, with=FALSE] works as intended. This is quite useful for interactive use. Closes #748.

the installation of the development-version is explained here

Thanks for pointing this out @AnandaMahto and @Arun!

also using select_vars(names(dt), w:y) from dplyr is probably better than names(select(dt, w:y)) as pointed out by @shadow, because it gives more options to select the names, is more readable and conveys the intent more clearly.