I'm trying to sample a dummy based on probabilities that are part of my data.table. If my data.table only has two rows, this works:
library(data.table)
playdata <- data.table(id = c("a","b"), probabilities = c(0.2, 0.3))
playdata[, sampled_dummy := sample(c(0,1),1, prob = probabilities)]
if it has three or more rows, it does not:
library(data.table)
playdata <- data.table(id = c("a","b","c"), probabilities = c(0.2, 0.3, 0.4))
playdata[, sampled_dummy := sample(c(0,1),1, prob = probabilities)]
Error in sample.int(length(x), size, replace, prob) :
incorrect number of probabilities
Can someone explain this? I know I can apply any function row by row by force but why does sample break the standard data.table syntax? Should it not do everything row by row anyways?
edit: a usual workaround throws the same error:
playdata[, sampled_dummy := sample(c(0,1),1, prob = probabilities), by = seq_len(nrow(playdata))]
I think you need to do the sampling row-wise, so I'll demo with
sapply:Though I suspect that it might be easier for you to use
runif(.N) > probabilites?