inverse selection for by in R data.table

108 Views Asked by At

I have the following data.table:

DT <- data.table(id=c("A","A","B","B","C","C"),condition=c(1,2,1,2,1,2),value=c(0,1,1,3,2,2))

For each value of id and condition, I want to compute the mean value for that condition but for all values of id other than the current one. My current solution is:

DT[,meanothers:=DT[id!=ID & condition==CONDITION,mean(value)],by=.(ID=id,CONDITION=condition)]

Is there a faster or more memory-efficient solution to this problem in data.table?

2

There are 2 best solutions below

2
Roland On

Using the idea from one of my old answers:

DT[, c("Sum", "N") := .(sum(value), .N), by = condition]
DT[, c("sum", "n") := .(sum(value), .N), by = .(id, condition)]
DT[, meanothers1 := (Sum - sum) / (N - n)]
#        id condition value meanothers   Sum     N   sum     n meanothers1
#    <char>     <num> <num>      <num> <num> <int> <num> <int>       <num>
# 1:      A         1     0        1.5     3     3     0     1         1.5
# 2:      A         2     1        2.5     6     3     1     1         2.5
# 3:      B         1     1        1.0     3     3     1     1         1.0
# 4:      B         2     3        1.5     6     3     3     1         1.5
# 5:      C         1     2        0.5     3     3     2     1         0.5
# 6:      C         2     2        2.0     6     3     2     1         2.0
0
Yuriy Saraykin On
library(data.table)

dt <- data.table(id=c("A","A","B","B","C","C"),condition=c(1,2,1,2,1,2),value=c(0,1,1,3,2,2))

dt[, meanothers := mapply(
  FUN = function(x, y) weighted.mean(value, w = (x != id) * (y == condition)),
  x = id,
  y = condition
)][]
#>    id condition value meanothers
#> 1:  A         1     0        1.5
#> 2:  A         2     1        2.5
#> 3:  B         1     1        1.0
#> 4:  B         2     3        1.5
#> 5:  C         1     2        0.5
#> 6:  C         2     2        2.0

Created on 2024-03-06 with reprex v2.0.2