Calculating stats for random subsample using R loop

1.5k Views Asked by Kevin T At 15 June 2015 at 18:02

I am trying to find a way in R to randomly subset some data (proportion of suitable habitat in an area for an ecological study), calculate a mean and proportion of samples with values > 0 and then save or append those values to a dataframe. I then want to repeat this a number of times (1000 for the example). Standard bootstrapping or resampling packages won't work as I need to calculate freq of occurance as well as the mean of the subsample. I'm aware of the "apply" functions, but those loop over the entire data frame whereas I'm trying to do it on a subsample repeated. I know I need some code to get the calculated values in the loop to save and output but having issues. "habprop" is a column in a dataframe ("data") that I want to calculate the mean and proportion of positive values for and save.

for(i in 1000 {  
randsample=data[sample(1:nrow(data), 50, replace=FALSE),]
m=mean(randsample$habprop)
randsamplepos=subset(randsample, habprop > 0)
habfreq=(nrow(randsamplepos)/nrow(randsample))
})

Original Q&A

There are 2 best solutions below

ajb On 15 June 2015 at 18:19

How about the replicate function? This post looks pretty similar.

Generating some data to work on

data <- data.frame(x1=rpois(5000, 5), x2=runif(5000), x3=rnorm(5000))

Defining a function to sample and take means and counts

sample_stats <- function(df, n=100){
  df <- df[sample(1:nrow(df), n, replace=F),]
  mx1 <- mean(df$x1[df$x1>0])
  x1pos <- sum(df$x1>0)
  return(c(mx1, x1pos))
}

run it once just to see output

sample_stats(data)

run it 1000 times

results <- replicate(1000, sample_stats(data, n=100))

Rorschach On 15 June 2015 at 18:30

Using boot this should be possible

dat <- data.frame(habprop=rnorm(100))

## Function to return statistics from subsamples
stat <- function(dat, inds)
    with(dat, c(mu=mean(habprop[inds]), freq=sum(habprop[inds] > 0)/length(inds)))

library(boot)
boot(data=dat, statistic=stat, R=1000)

# Bootstrap Statistics :
#        original      bias    std. error
# t1* -0.06154533 -0.00324393  0.08377116
# t2*  0.52000000 -0.00073000  0.04853991

Calculating stats for random subsample using R loop

There are 2 best solutions below

Related Questions in R

Related Questions in FOR-LOOP

Related Questions in SUBSAMPLING

Trending Questions

Popular # Hahtags

Popular Questions