Unable to manually set number of parallel workers in R "NMF" package -- only using 2 cores

201 Views Asked by At

I am trying to use the NMF package on a brand new M2 Ultra Mac Studio with 24 cores. I successfully installed the package (install.packages("NMF")), but then when I load it (library("NMF")), it reports detecting only 2 cores: NMF - BioConductor layer [OK] | Shared memory capabilities [OK] | Cores 2/2

Here's some reproducible code:

library("parallel")
library("foreach")
library("doParallel")
library("doMC")
library("NMF")


data <- matrix(data = runif(500000), nrow = 10000) # a 10,000 x 50 non-negative data matrix
nmf.options(verbose = TRUE,
            pbackend = 'par',
            cores = 20) # tried manually setting NMF parameters
doMC::registerDoMC(cores = 20) # tried manually registering a parallel backend with doMC

Sys.setenv("R_PACKAGE_NMF_CORES" = 20) # tried setting whatever this variable that the NMF source code references is

# # Set up parallel workers
# cl <- makeCluster(n.cores.to.use.nmf, type = "FORK") 
# registerDoParallel(cl) # tried manually registering a parallel backend with doParallel

nmf.models <- nmf(data, 
                  rank = 5:7, 
                  nrun = 30, 
                  .opt = 'vP20', # tried manually setting it to 20 cores here (v = verbose, P = force parallel, 20 = 20 cores)
                  # .pbackend = 20, # here too
                  seed = 123)

# stopCluster(cl)

This same code works on my M1 MacBook Pro with 8 cores, but I can't seem to get it to run with more than 2 cores (way too slow) on the new computer. Would really appreciate any help.

Specs: NMF version .26, R version 4.3.1-arm64, macOS Ventura 13.5.1.

I've tried every combination of the manual overrides in that code block that I can think of. Nothing gets me anything but 2 parallel workers. I've looked a bit at the source code, and I'm not sure, but I suspect the issue is in one of two places: Either here in the .onload function (hence my attempts to manually set the value of R_PACKAGE_NMF_CORES above)...

.onLoad <- function(libname, pkgname) {
        
    # set default number of cores
    if( isCHECK() ){
        options(cores=2)
    }else{
        if( nchar(nc <- Sys.getenv('R_PACKAGE_NMF_CORES')) > 0 ){
            try({
                nmf.options(cores=as.numeric(nc))
            })
        }   
    }

...or in this getMaxCores function in the Parallel.R file (/NMF/R/Parallel.R), which seems to just set the number of cores to 2 regardless:

# Definitions used in the parallel computations of NMF
#
# - reproducible backend
# - reproducible %dopar% operator: %dorng%
# 
# Author: Renaud Gaujoux
# Creation: 08-Feb-2011
###############################################################################

#' @include utils.R
#' @import foreach
#' @import doParallel
NULL

# returns the number of cores to use in all NMF computation when no number is
# specified by the user
getMaxCores <- function(limit=TRUE){
    #ceiling(parallel::detectCores()/2)
    nt <- n <- parallel::detectCores()
    # limit to number of cores specified in options if asked for

    if(n > 2) n <- 2

    # forces limiting maximum number of cores to 2 during CRAN checks
    if( n > 2 && isCHECK() ){
        message("# NOTE - CRAN check detected: limiting maximum number of cores [2/", nt, "]")
        n <- 2L
    }
    n
}

I don't get why either of these issues wouldn't be problematic on my laptop, where the parallel code works as expected, but this is the only thing I can think of. Any ideas? Thanks in advance

1

There are 1 best solutions below

0
Adam Morgan On

For anyone having the same problem, I solved it by installing the previous version of NMF: 0.25. The problem seems to be specific to version 0.26. Here's some code to fix it:

# install.packages("devtools")
library("devtools")
install_version("NMF", version = "0.25")

I opted not to update any other packages (option 3. None), not sure whether this mattered.