Nesting of foreach loops R

182 Views Asked by At

I have a code that very similar to this:

for(i in 1:5){
    mat<-matrix(runif(i^2,0,1), nrow=i, ncol=i)
    mat.max<-round(max(mat), 2)
    mat.min<-round(min(mat), 2)
    mat.tresh.seq<-seq(mat.min, mat.max, 0.01)
    dir.loc<-paste('~/', i, '/', sep='')
    dir.create(dir.loc, recursive=TRUE)
    mat.name<-paste(dir.loc, 'og-mat.csv', sep='')
    write.csv(mat, mat.name)
    dir.loc.2<-paste(dir.loc, 'treshhold/', sep='')
    dir.create(dir.loc.2, recursive=TRUE)
    for(j in mat.tresh.seq){
        mat.tresh <- mat>=j
        mat.tresh[mat.tresh == TRUE] <- 1
        mat.tresh[mat.tresh == FALSE] <- 0
        mat.tresh.name<-paste(dir.loc.2, 'thresh mat ', j, '.csv', sep='')
        write.csv(mat.tresh, mat.tresh.name)
    }
}

Each random matrix can be generated independently of the other and each threshold matrix can be generated independently of the other, but the threshold matrices depend upon the random matrices. How would I go about doing nested parallelization for such a code? Must I choose only one loop to do in parallel?

Thanks.

1

There are 1 best solutions below

1
On BEST ANSWER

I tend not to mix data processing and saving data. If you separate those and the two types of matrices, then you have all sorts of options to run parallel functions. So my answer to the question about nested loops where the inner depends on the outer but are otherwise independent would be to unnest them.

# starting matrices
og <- lapply(1:100,function(i){
  matrix(runif(i^2,0,1), nrow=i, ncol=i)
})

# threshhold matrices
y <- lapply(og,function(x){
  mat.tresh.seq <- seq(round(min(x), 2), round(max(x), 2), 0.01)
  z <- lapply(mat.tresh.seq,function(y,mat){
    mat.tresh <- mat>=y
    mat.tresh * 1
  },mat = x)
  names(z) <- mat.tresh.seq
  z
})

# directory/file structure
ynames <- lapply(y,names)

# create all folders
lapply(paste0('~/',1:length(ynames),'/threshhold'),dir.create,recursive = T)

# write og files
mapply(FUN = function(mainfolder,ogfiles){
  filename <- paste('~/',mainfolder, '/og-mat.csv', sep='')
  write.csv(ogfiles,filename)
},mainfolder = 1:length(og),ogfiles = og)

# write threshhold files
mapply(mainfolder = 1:length(ynames),filenames = ynames,FUN = function(mainfolder,filenames,ydata){
  lapply(filenames,function(x){
    filename <- paste('~/',mainfolder, '/threshhold/thresh mat ', x, '.csv', sep='')
    write.csv(ydata[[mainfolder]][[x]],filename)
  })
},MoreArgs = list(ydata = y))

Every *apply function can instead be the parallel version (clusterMAP for mapply if you're on Windows). Unless memory is an issue (more than about 100 starting matrices on my computer), then you won't need to write each separately before calculating the next. In that case, writing the starting matrices to disk first, and then reading each in and processing might be a good idea.

This is near instant at a max of 100x100 except writing all the individual threshhold files in the last mapply. Parallelizing that will help the most.