I am performing group-based trajectory modeling in R using the lcmm package. This is my code:
fitGbtmql = function(k) {
start = Sys.time()
model = do.call(gridsearch, list(m = makeGbtmCallql(k), rep = 5, maxiter = 5, minit = gbtm_modelsql[['1']], cl = 10))
model$runTime = Sys.time() - start
return(model)
}
where
makeGbtmCallql(k)
is an hlme function with k clusters.
My dataset is very large with around 4.3 million rows. The gridsearch is ofcourse exceptionally time consuming and the 10-core parallel processing doesn't seem to help very much. I can get the cores even higher even to 20 but I doubt it'll make a big difference.
If it is any help I believe that the cl argument for parallel processing is based on this code
gridsearch.parallel <- function(m,rep,maxiter,minit,cl=NULL)
{
if(!is.null(cl)){
clusterSetRNGStream(cl)
mc <- match.call()$m
mc$maxiter <- maxiter
assign("minit",eval(minit))
clusterCall(cl, function () require(lcmm))
clusterExport(cl, list("mc", "maxiter", "minit", as.character(as.list(mc[-1])$data)), envir = environment())
cat("Be patient, grid search is running ...\n")
models <- parLapply(cl, 1:rep, function(X)
{
mc$B <- substitute(random(minit),parent.frame(n=2))
return(do.call(as.character(mc[[1]]),as.list(mc[-1])))
}
)
llmodels <- sapply(models,function(x){return(x$loglik)})
kmax <- which.max(llmodels)
mc$B <- models[[kmax]]$best
mc$maxiter <- NULL
cat("Search completed, performing final estimation\n")
return(do.call(as.character(mc[[1]]),as.list(mc[-1])))
}
return(do.call(gridsearch, as.list(match.call()[2:5])))
}
(source: https://github.com/CecileProust-Lima/lcmm/issues/39)
I was wondering if there is a way to run this in another package that usilizes GPU or some other way to speed it up with the CPU (I read some things about hyperthreading, and perhaps turning it off, or something). I am very new in both R and coding and definetely a big noob in machine learning so any advice would be very helpful.
My hardware is: I9-13900k RTX A4000 64GB DDR5