I have trouble implementing a function in the boot library.
The function I want to implement is the following
fsyn = function(x) sum( x[1,] == x [2,] )
It's a count of the matches between two sequences.
My data is a set of sequences such as
id V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1 1 c a c b c c b d d a
2 1 c d a a c b d a b a
3 2 b d c b b b c d a b
4 2 b a b c b c d b a d
Something important about these sequences is the fact that they are paired by id.
I am interested in doing two things, the first is to bootstrap the number of matches by id and the second for two random individuals.
The first procedure can be implemented by
library(dplyr)
chid = df$id
# sampling paired sequences #
wchid = function(chid) which(chid %in% sample(chid, 1))
# the matches function #
fsyn = function(x) sum( x[1,] == x [2,] )
# wrapping the function #
funcHamC = function(df) df[wchid(chid), -1] %>% fsyn
with
df %>% funcHamC
The second function can simply be written like
funcHamR = function(df) df[sample(df$id, 2), -1] %>% fsyn
df %>% funcHamR
However, I have issues using these two functions with boot.
library(boot)
boot(df, funcHamC, R = 10)
boot(df, funcHamR, R = 10)
This isn't working. Any idea ?
data
df = as.data.frame( t(replicate(20, sample(letters[1:4], 10, T))) )
df$id = rep(1:10, 2)
df = df %>% select(id, everything()) %>% arrange(id)
The
bootfunction expects two arguments to thestatisticfunction—the second being a parameter to specify the sample values to choose. Because you are using your own methods to choose randomly from the data, you should set thesimargument to'parametric'. This will use theran.genargument to specify a function to generate random values from the data.To quote from the help file: "If
ran.genis not specified, the default is a function which returns the originaldatain which case all simulation should be included as part ofstatistic."Save the boot output to variables—such as
C.bootandR.bootand you will find the samples inC.boot$tandR.boot$t.You can then obtain the stats you need from the generated values.