Why does sampling without replacement in my function just duplicate my output?

67 Views Asked by At

I've written two functions, one works, one doesn't. Was wondering why sampling in the first example just duplicates my results - why is it not creating new shufflings without replacement for each replication?

Probably missing something obvious, but not sure what...

       D1<- matrix(runif(10*12, 0, 2), ncol=12))

       perm_fun1 <- function(DF){
         # permute each column
         smp <- sample(1:ncol(DF), replace=F)
         new_data <- DF[, smp]
         # run PCA
         pc.perm.out<-prcomp(new_data,center=T, scale.=T)
         # get the proportion of variance of each PC.perm
         pve.perm=(pc.perm.out$sdev^2/sum(pc.perm.out$sdev^2))
       }


      perm_fun2 <- function(DF){
         # permutation each column
         new_data <- apply(DF, 2, sample)
         # run PCA
         pc.perm.out<-prcomp(new_data,center=T, scale.=T)
         # get the proportion of variance of each PC.perm
         pve.perm=(pc.perm.out$sdev^2/sum(pc.perm.out$sdev^2))
       }

       out_smp1 <- sapply(1:5, function(i) perm_fun1(D1))
       out_smp2 <- sapply(1:5, function(i) perm_fun2(D1))

out_smp1 just gets a 10x5 output, but each value in each column for each row is repeated. How would I change perm_fun1 for it not to do that?

EDIT:

function 2 is the correct one for my purpose.

Useful links https://towardsdatascience.com/pca-102-should-you-use-pca-how-many-components-to-use-how-to-interpret-them-da0c8e3b11f0

Other similar problem PCA explained variance is the same on permutations of data

Article they reference which is behind paywall on Medium https://towardsdatascience.com/how-to-tune-hyperparameters-of-tsne-7c0596a18868

0

There are 0 best solutions below