I'm a beginner in R and am trying to create a birthday paradox function and managed to reach this point, and the result is approximately 0.5, as expected.
k <- 23
sims <- 1000
event <- 0
for (i in 1:sims) {
days <- sample(1:365, k, replace = TRUE)
days.unique <- unique(days)
if (length(days.unique) < k) {
event <- event + 1 }
answer <- event/sims}
answer
However, when I tried to put that into a function, the result was always 0.001. Here is the code:
bdayfunction<- function(k){
sims <- 1000
event <- 0
for (i in 1:sims) {
days <- sample(1:365, k, replace = TRUE)
days.unique <- unique(days)
if (length(days.unique) < k) {
event <- event + 1 }
answer <- event/sims
return (answer)
}
}
What have I done wrong?
Your
returnis not in the right place: it is in the loop (the same holds for youranswercalculation by the way).This works:
In R, you can make use of libraries that allows you to do grouping operation. The two main ones are
data.tableanddplyr. Here, instead of doing a loop, you could try to create a long data.frame with all your simulations, to then calculate the unique number of days per simulation and then count the number of occurrence belowk. Withdplyr:In
data.table:You can test that they provide the same result:
Now lets compare the speed:
You see that
data.tableis slightly faster than your initial loop, and shorter to write.