R - implement bootstrap function for paired (mis)matches

278 Views Asked by giac At 20 December 2016 at 19:26

I have trouble implementing a function in the boot library.

The function I want to implement is the following

fsyn = function(x) sum( x[1,] == x [2,] )

It's a count of the matches between two sequences.

My data is a set of sequences such as

   id V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1   1  c  a  c  b  c  c  b  d  d   a
2   1  c  d  a  a  c  b  d  a  b   a
3   2  b  d  c  b  b  b  c  d  a   b
4   2  b  a  b  c  b  c  d  b  a   d

Something important about these sequences is the fact that they are paired by id.

I am interested in doing two things, the first is to bootstrap the number of matches by id and the second for two random individuals.

The first procedure can be implemented by

library(dplyr) 

chid = df$id
# sampling paired sequences # 
wchid = function(chid) which(chid %in% sample(chid, 1))
# the matches function # 
fsyn = function(x) sum( x[1,] == x [2,] ) 
# wrapping the function # 
funcHamC = function(df) df[wchid(chid), -1] %>% fsyn

with

df %>% funcHamC

The second function can simply be written like

funcHamR =  function(df) df[sample(df$id, 2), -1] %>% fsyn
df %>% funcHamR

However, I have issues using these two functions with boot.

library(boot)
boot(df, funcHamC, R = 10)
boot(df, funcHamR, R = 10)

This isn't working. Any idea ?

data

df = as.data.frame( t(replicate(20, sample(letters[1:4], 10, T))) ) 
df$id = rep(1:10, 2)
df = df %>% select(id, everything()) %>% arrange(id)

Original Q&A

There are 1 best solutions below

Edward Carney On 20 December 2016 at 21:06 BEST ANSWER

The boot function expects two arguments to the statistic function—the second being a parameter to specify the sample values to choose. Because you are using your own methods to choose randomly from the data, you should set the sim argument to 'parametric'. This will use the ran.gen argument to specify a function to generate random values from the data.

To quote from the help file: "If ran.gen is not specified, the default is a function which returns the original data in which case all simulation should be included as part of statistic."

Save the boot output to variables—such as C.boot and R.boot and you will find the samples in C.boot$t and R.boot$t.

C.boot <- boot(df, statistic=funcHamC, R = 10, sim='parametric')
R.boot <- boot(df, statistic=funcHamR, R = 10, sim='parametric')

You can then obtain the stats you need from the generated values.

R - implement bootstrap function for paired (mis)matches

There are 1 best solutions below

Related Questions in R

Related Questions in SEQUENCE

Related Questions in STATISTICS-BOOTSTRAP

Trending Questions

Popular # Hahtags

Popular Questions