I would like to select multiple columns from a data.table (with 1200 column-names) specifying a range by column-name, like one can do with dplyr, for example:
library(data.table)
library(dplyr)
dt <- data.table(w = sample(100, 50),
x = sample(100, 50),
y = sample(100, 50),
z = sample(100, 50))
select(dt, w:y)
Currently I am using the following work-around:
cols_to_select <- names(select(dt, w:y))
dt[ ,cols_to_select, with = FALSE]
I think the other option of using the col-numbers (for example dt[ , 1:3, with = FALSE]
could lead to nasty bugs.
Another option to select the names would be:
dt[ , .SD, .SDcols = cols_to_select]
It would be super-awesome if there was something like the following:
dt[ , .(w:y)]
Is there a better way to do this? And if not why? If this question is better placed at the github-issues of data.table let me know
What I asked for is possible in the development-version of data.table (1.9.5) as can be seen in the new feature number 17. quoting:
the installation of the development-version is explained here
Thanks for pointing this out @AnandaMahto and @Arun!
also using
select_vars(names(dt), w:y)
from dplyr is probably better thannames(select(dt, w:y))
as pointed out by @shadow, because it gives more options to select the names, is more readable and conveys the intent more clearly.