I have the following code that selects 4 rows of iris 1000x, and takes the mean of each 4 row sample:
library(dplyr)
iris<- iris
storage<- list()
counter<- 0
for (i in 1:1000) {
# sample 3 randomly selected transects 100 time
tempsample<- iris[sample(1:nrow(iris), 4, replace=F),]
storage[[i]]=tempsample
counter<- counter+1
print(counter)
}
# Unpack results into dataframe
results<- do.call(rbind, storage)
View(results)
results_2<- as.data.frame(results)
results_2<- results_2 %>% mutate(Aggregate = rep(seq(1,ceiling(nrow(results_2)/4)),each = 4))
# View(results_2)
final_results<- aggregate(results_2[,1:4], list(results_2$Aggregate), mean)
# View(final_results)
I want to calculate the bias of each column in relation to their true population parameter. For example using SimDesign's bias():
library(SimDesign)
(bias(final_results[,2:5], parameter=c(5,3,2,1), type='relative'))*100
In this code, the values of parameter are hypothetical true pop. values of each column in the dataframe. I want to do this process 100x to get a distribution of bias estimates for each variable in the dataframe. However, I'm not sure how to fit all of this into a for loop (what I think would be the way to go) so the final output is a dataframe with 100 rows of bias measurements for each iris variable.
Any help with this would be greatly appreciated.
#------------------------------
Update
Trying to run the same code for a stratified sample as opposed to a random sample gives me the following error: *Error in [.data.table(setDT(copy(iris)), as.vector(sapply(1:1000, function(X) stratified(iris, :
i is invalid type (matrix). Perhaps in future a 2 column matrix could return a list of elements of DT * I think this might be related to setDT?
This is a result of the following code:
do.call(rbind,lapply(1:100, function(x) {
bias(
setDT(copy(iris))[as.vector(sapply(1:1000, function(X) stratified(iris,group="Species", size=1)))][
, lapply(.SD, mean), by=rep(c(1:1000),4), .SDcols=c(1:4)][,c(2:5)],
parameter=c(5,3,2,1),
type='relative'
)
}))
I looked into using the following code which was suggested:
get_samples <- function(n, sampsize=4) {
rbindlist(lapply(1:n, function(x) {
splitstackshape::stratified(iris, group="Species",sampsize)[, id:=x] }))[
, lapply(.SD, mean), by=.(Species, id)] }
I think I understand what this function is doing (selecting 4 stratified rows of iris, taking the means of each column by species), but I'm not sure how to apply it to the original question of doing it (4*1000)*100 to test the bias (I'm very new at this so apologies if I'm missing something obvious).
Here is one way to do that. I've made some minor changes to your code, and wrapped it in a function. Then, use
lapplyover a sequence say1:10or1:100, each time running your function, and feeding the result to yourbiasfunction from theSimDesignpackage. Then row bind the resulting listOutput:
Fast approach to the problem