how to make mclapply in Rscript maximize use of all available linux cores?

113 Views Asked by November2Juliet At 15 November 2021 at 21:05

I'm reading in a parquet file with ~1 million rows, wrangling each row, and writing out csvs. The data wrangling itself is quite simple: I select all rows of a UserID (of which there are several for each UserID in random order within the dataframe) and write out the UserID to its individual csv. But since there are so many rows, the script runs for ~5 hours. I have hundreds of parquet files overall and I need to parallelize. I used the mclapply() function to parallalize by UserID. The script runs successfully, but is barely faster than when I run it with a single core. I opened the command line and ran htop and confirmed that each core is only utilizing 5% or less of its available memory on this script. When I initially run the script, each core is 100% utilized, but a few minutes later the utilization plummets. How can I ensure CPUs are used efficiently with mclapply? I've tried increase the mc.cores argument from 16 to 100 and I get the same problem every time. I'm on a Linux Ubuntu VM with 16 cores and 128GB, but I can adjust the settings to give myself more cores and/or memory.

Original Q&A

how to make mclapply in Rscript maximize use of all available linux cores?

There are 0 best solutions below

Related Questions in R

Related Questions in UBUNTU

Related Questions in PARALLEL-PROCESSING

Related Questions in MCLAPPLY

Trending Questions

Popular # Hahtags

Popular Questions