After many attempts, I still cannot solve this error.
First, here are my specs:
R version 4.1.3 (2022-03-10) -- "One Push-Up"
Platform: x86_64-pc-linux-gnu (64-bit)
Linux 5.13.0-37-generic #42~20.04.1-Ubuntu
x2 Intel(R) Xeon(R) Gold 6248R CPU @ 3.00GHz
cpu cores: 24 (48 total)
MemAvailable: 1374609492 kB (~1.37 TB)
The problem is simple. I am attempting to read in ~10,000 text files, each ~26 MB. Here is my code:
files <- list.files("/home/comp/Documents/files", recursive = FALSE, full.names = TRUE)
#parallelly (run in R, not Rstudio)
cl <- parallelly::makeClusterPSOCK(90)
registerDoParallel(cl)
clusterExport(cl=cl, varlist=c("files"))
clusterEvalQ(cl, library("data.table")))
process = parLapply(cl, 1:length(files), function(x) {
file <- fread(files[x])
}
)
The function runs fine at first, and in about 45 minutes the parallel processing portion of the function appears to be completed (i.e. multithreading stops as seen through htop). After this, the data is (presumably) being transferred back to the main CPU from the threads (please excuse my ignorance if this is incorrect). This portion of the function takes the longest (maybe another hour or so), and slowly increases the RAM being utilized. It is at the very end of this portion of the process that I receive the following error:
R: Error: cons memory exhausted (limit reached?)
I have made the following attempts:
mem.maxVSize(vsize = Inf)
mem.maxNSize(nsize = Inf)
Sys.setenv('R_MAX_VSIZE' = "3000Gb")
Sys.setenv('R_MAX_NSIZE' = 2e16)
Sys.setenv('R_MAX_MEM_SIZE' = Inf)
These settings do not help, despite the fact that my RAM usage for this function is less than half of its capacity (~530 GB of 1.35 TB).
Is there a BIOS setting or another R environment variable I missing? Any help would be greatly appreciated.