My goal is to apply multiple functions to multiple columns AND to have GForce turned on.
Say I have the below dataframe
library(data.table)
df <- data.table(fruit = c('a', 'a', 'a', 'b')
, revenue = 1:4
, profit = c(2,NA,4,5)
); df
fruit revenue profit
1: a 1 2
2: a 2 NA
3: a 3 4
4: b 4 5
and I wanteed to apply multiple functions to multiple columns (all except fruit)
# functions
y <- \(i) {c(min(i, na.rm = T)
, max(i, na.rm = T)
)
}
# apply
df[, lapply(.SD, y)
, fruit
, verbose = T
]
Finding groups using forderv ... forder.c received 4 rows and 1 columns
0.000s elapsed (0.000s cpu)
Finding group sizes from the positions (can be avoided to save RAM) ... 0.000s elapsed (0.000s cpu)
lapply optimization changed j from 'lapply(.SD, y)' to 'list(y(revenue), y(profit))'
GForce is on, left j unchanged
Old mean optimization is on, left j unchanged.
Making each group and running j (GForce FALSE) ...
memcpy contiguous groups took 0.000s for 2 groups
eval(j) took 0.012s for 2 calls
0.020s elapsed (0.020s cpu)
fruit revenue profit
1: a 1 2
2: a 3 4
3: b 4 5
4: b 4 5
Now, the above works!
However, notice it said (GForce FALSE). So GForce was NOT on.
I think this is because, as Waldi pointed out, when \(i) sum(i) is used, GForce is NOT on.
I then tried the below and passing na.rm = T only in lapply
# functions
z <- \(i) {c(min
, max
)
}
# apply
df[, lapply(.SD, z, na.rm = T)
, fruit
, verbose = T
]
Finding groups using forderv ... forder.c received 4 rows and 1 columns
0.000s elapsed (0.000s cpu)
Finding group sizes from the positions (can be avoided to save RAM) ... 0.000s elapsed (0.000s cpu)
lapply optimization changed j from 'lapply(.SD, z, na.rm = T)' to 'list(z(revenue, na.rm = T), z(profit, na.rm = T))'
GForce is on, left j unchanged
Old mean optimization is on, left j unchanged.
Making each group and running j (GForce FALSE) ... Error in z(revenue, na.rm = T) : unused argument (na.rm = T)
This time the error is as per above. Specifically Error in z(revenue, na.rm = T) : unused argument (na.rm = T)
Any help would be much appreciated
From
help("gforce"):You are obviously not passing an expression containing these functions. They are hidden (to data.table's gforce optimization) inside the
yfunction.I would do this:
The warning is because of the different column types in
df("integer" and "double"). Ensure they are identical to avoid it.