I've written two functions, one works, one doesn't. Was wondering why sampling in the first example just duplicates my results - why is it not creating new shufflings without replacement for each replication?
Probably missing something obvious, but not sure what...
D1<- matrix(runif(10*12, 0, 2), ncol=12))
perm_fun1 <- function(DF){
# permute each column
smp <- sample(1:ncol(DF), replace=F)
new_data <- DF[, smp]
# run PCA
pc.perm.out<-prcomp(new_data,center=T, scale.=T)
# get the proportion of variance of each PC.perm
pve.perm=(pc.perm.out$sdev^2/sum(pc.perm.out$sdev^2))
}
perm_fun2 <- function(DF){
# permutation each column
new_data <- apply(DF, 2, sample)
# run PCA
pc.perm.out<-prcomp(new_data,center=T, scale.=T)
# get the proportion of variance of each PC.perm
pve.perm=(pc.perm.out$sdev^2/sum(pc.perm.out$sdev^2))
}
out_smp1 <- sapply(1:5, function(i) perm_fun1(D1))
out_smp2 <- sapply(1:5, function(i) perm_fun2(D1))
out_smp1 just gets a 10x5 output, but each value in each column for each row is repeated. How would I change perm_fun1 for it not to do that?
EDIT:
function 2 is the correct one for my purpose.
Other similar problem PCA explained variance is the same on permutations of data
Article they reference which is behind paywall on Medium https://towardsdatascience.com/how-to-tune-hyperparameters-of-tsne-7c0596a18868