I'm trying to run PCA on two large datasets derived from the same parent dataset earlier in the script. I would like to perform the PCA in parallel on each of the objects, but for some reason I can't get it to work. The code block runs successfully and produces the expected output if run with a regular for
loop, but each one takes about 1h to run, and I'd like to take advantage of the server's capacity, as I have to do this bot ~15 datasets.
This is my code:
selectObject <- function(object) {
if(object == "scaled") {
scaling <<- "_scaleOnly"
pca.result <<- "pca.scaled"
object.path <<- path.scaled.object
}
if(object == "scaled.regressed") {
scaling <<- "_scale_nUMIregress"
pca.result <<- "pca.scaled.regressed"
object.path <<- path.scaled.regressed.object
}
}
seurat.objects <- list(scaled=seurat.object.scale,
scaled.regressed=seurat.object.scale.regress
)
library(foreach)
library(doParallel)
cores <- detectCores()
cl <- makeCluster(2)
doParallel::registerDoParallel(cl)
foreach(object=names(seurat.objects)) %dopar% {
print(object)
selectObject(object)
print(paste(object, pca.result, scaling, pca.path))
assign(pca.result,
doFastPCA(t(seurat.objects[[object]]@scale.data))
)
saveRDS(pca.result,
paste0("/path/to/pcaObject.", age, scaling, ".Rds")
)
}
The above stalls forever without producing even the very first print()
output, and when I cancel the process with ^C
, I get the following error:
Error: "'...' used in an incorrect context"
But, if I replace the foreach
line with:
for (object in names(seurat.objects)) {
[everything as above]
}
then it runs successfully, albeit sequentially.
What am I doing wrong?