Using := multiple times in data.table

304 Views Asked by At

I frequently find myself doing a long series of chained calculations using := on the same data table. For example, something like this

test = data.table(1:10, 1:10, 1:10, 1:10)

test[, V1 := V1^2]
test[, V2 := V1*V2]
test[, V3 := V2/V3]
test[, V4 := sqrt(V3)]
test[, new := letters[V4]]

Having to write test[, ...] on every line 1) takes longer to type (not a big issue I can cope with that). But, more importantly, also distracts visually from the flow and content of the calculation. I would much rather write something like

test[, {
  V1 := V1^2
  V2 := V1*V2
  V3 := V2/V3
  V4 := sqrt(V3)
  new := letters[V4]
}]

But this throws an error of You have wrapped := with {} which is ok but then := must be the only thing inside {}.

I know that I can write

within(test, {
  V1 = V1^2
  V2 = V1*V2
  V3 = V2/V3
  V4 = sqrt(V3)
  new = letters[V4]
  })

But this loses the efficiency of using :=

I tried writing a function to provide this ability

with.dt = function(dt, expressions){
  e = strsplit(expressions,'\n')
  for (i in 1:length(e)){    
    dt[, eval(parse(text = e[i]))]
  }
  dt
  }

with.dt(test, "
  V1 := V1^2;
  V2 := V1*V2;
  V3 := V2/V3;
  V4 := sqrt(V3);
  new := letters[V4];
  ")

But this does not change the content of the data.table

Is there any syntax that permits the neat look of the within version, while retaining the assign by reference property of :=?

2

There are 2 best solutions below

1
Kristofersen On

There are a couple different ways to do this.

Original test matrix used here:

   v1 v2
1:  1  3
2:  2  4

First, we can do something like this:

test[,`:=`(v1 = v1^2,
          v2 = v1*v2)

Output:

v1 v2
 1  3
 4  8

Or, if we want it done sequentially we can use Frank's hack.

test[, `:=`(v1 = v1 <- v1^2, v2 = v2 * v1)]

Output:

v1 v2
 1  3
 4 16  

Or, similarly we could run this:

test[,c("v1","v2") := list(v1^2,v1^2*v2)]

Output:

v1 v2
 1  3
 4 16
6
dww On

We can use a function that takes a list of expressions:

with.dt = function(dt, expr){
  for (j in 1:length(expr)) set(dt, , names(expr)[j], dt[, eval(expr[[j]])])
}

test = data.table(1:10, 1:10, 1:10, 1:10)
with.dt(test, expression(
  V1 = V1^2,
  V2 = V1*V2,
  V3 = V2/V3,
  V4 = sqrt(V3),
  new = letters[V4]
))

#     V1   V2  V3 V4 new
# 1:   1    1   1  1   a
# 2:   4    8   4  2   b
# 3:   9   27   9  3   c
# 4:  16   64  16  4   d
# 5:  25  125  25  5   e
# 6:  36  216  36  6   f
# 7:  49  343  49  7   g
# 8:  64  512  64  8   h
# 9:  81  729  81  9   i
#10: 100 1000 100 10   j