R occupying virtual Memory completely

150 Views Asked by At

I rewrote my program many times to not hit any memory limits. It again takes up full VIRT which does not make any sense to me. I do not save any objects. I write to disk each time I am done with a calculation.

The code (simplified) looks like


 lapply(foNames, # these are just folder names like ["~/datastes/xyz","~/datastes/xyy"]
        function(foName){
     Filepath <- paste(foName,"somefile,rds",sep="")
     CleanDataObject <- readRDS(Filepath) # reads the data

     cl <- makeCluster(CONF$CORES2USE) # spins up a cluster (it does not matter if I use the cluster or not. The problem is intependent imho)

     mclapply(c(1:noOfDataSets2Generate),function(x,CleanDataObject){
                                            bootstrapper(CleanDataObject)
                                         },CleanDataObject)
     stopCluster(cl)
 })

The bootstrap function simply samples the data and save the sampled data to disk.

bootstrapper <- function(CleanDataObject){

   newCPADataObject <- sample(CleanDataObject)
   newCPADataObject$sha1 <- digest::sha1(newCPADataObject, algo="sha1")

   saveRDS(newCPADataObject, paste(newCPADataObject$sha1 ,".rds", sep = "") )

   return(newCPADataObject)
}

I do not get how this can now accumulate to over 60 GB of RAM. The code is highly simplified but imho there is nothing else which could be problematic. I can paste more code details if needed.

How does R manage to successively eat up my memory, even though I already re-wrote the software to store the generated object on disk?

1

There are 1 best solutions below

5
On

I have had this problem with loops in the past. It is more complicated to address in functions and apply.

But, what I have done is used two things in combination to fix the problem.

Within each function that generates temporary files, use rm(file-name) to remove the temp file and then run gc() which forces a garbage collection before exiting the functions. This will slow the process some, but reduce memory pressure. This way each iteration of apply will purge before moving on to the next step. You may have to go back to your first function in nested functions to accomplish this well. It takes experimentation to figure out where the system is getting backed up.

I find this to be especially necessary if you use ANY methods called from packages built over rJava, it is extremely wasteful of resources and R has no way of running garbage collection on the Java heap, and most authors of java packages do not seem to be accounting for the need to collect in their methods.