I'm reading in a parquet file with ~1 million rows, wrangling each row, and writing out csvs. The data wrangling itself is quite simple: I select all rows of a UserID (of which there are several for each UserID in random order within the dataframe) and write out the UserID to its individual csv. But since there are so many rows, the script runs for ~5 hours. I have hundreds of parquet files overall and I need to parallelize.
I used the mclapply() function to parallalize by UserID. The script runs successfully, but is barely faster than when I run it with a single core. I opened the command line and ran htop and confirmed that each core is only utilizing 5% or less of its available memory on this script. When I initially run the script, each core is 100% utilized, but a few minutes later the utilization plummets. How can I ensure CPUs are used efficiently with mclapply? I've tried increase the mc.cores argument from 16 to 100 and I get the same problem every time. I'm on a Linux Ubuntu VM with 16 cores and 128GB
, but I can adjust the settings to give myself more cores and/or memory.
how to make mclapply in Rscript maximize use of all available linux cores?
113 Views Asked by November2Juliet At
0
There are 0 best solutions below
Related Questions in R
- How to make an R Shiny app with big data?
- How do I keep only specific rows based on whether a column has a specific value?
- Likert scale study - ordinal regression model
- Extract a table/matrix from R into Excel with same colors and stle
- How can I solve non-conformable arguments in R netmeta::discomb (Error in B.matrix %*% C.matrix)?
- Can raw means and estimated marginal means be the same ? And when?
- Understanding accumulate function when .dir is set to "backwards"
- Error in if (nrow(peaks) > 0) { : argument is of length zero Calls: CopywriteR ... tryCatch -> tryCatchList -> tryCatchOne -> <Anonymous> Execution ha
- How to increase quality of mathjax output?
- Convert the time intervals to equal hours and fill in the value column
- How to run an R function getpoints() from IPDfromKM package in an R shiny app which in R pops up a plot that utilizes clicks to capture coordinates?
- Replace NA in list of dfs in certain columns and under certain conditions
- R and text on Cyrillic
- The ts() function in R is returning the correct start and frequency but not end value which is 1 and not 179
- TROUBLING with the "DROP_NA" Function
Related Questions in UBUNTU
- Error: local variable 'bramka' referenced before assignment
- Compiling eBPF program in Docker fails due to missing '__u64' type
- Can't connect to local postgresql server from my docker container
- How to install libfuse2 on Ubuntu 22.04
- Error when trying to execute a binary compiled in a Kali Linux machine on an Ubuntu system
- Can anyoone help me with this problem while trying to install hadoop on ubuntu?
- Ubuntu wsl2 in windows, my /etc/fonts/fonts.conf keeps reloading
- psutil.sensors.temperatures() only delivers {}
- Issue with [proxy_fcgi:error] [pid 1539011] (70007)The timeout specified has expired
- Viber is not working on Ubuntu 22.04 Jammy
- why is ubuntu 18.4 still showing as a linux subsystem when i have uninstalled it?
- Why when I want to open a folder from Visual Studio Code does the screen go crazy?
- What is this error when trying to update Ubuntu?
- Angular on IONOS(?) throws an error with npm
- Tensorflow can't find TensoRT
Related Questions in PARALLEL-PROCESSING
- How to calculate Matrix exponential with Tailor series PARALLEL using MPI c++
- Efficiently processing many small elements of a collection concurrently in Java
- Parallelize filling of Eigen Matrix in C++
- Memory efficient parallel repeated rarefaction with subsequent matrix addition of large data set
- How to publish messages to RabbitMQ by using Multi threading?
- Running a C++ Program with CMake, MPI and OpenCV
- Alternative approach to io.ReadAll to store memory consumption and send a PUT Request with valid data
- Parallelize nested loop with running sum in Fortran
- Can I use parfor within a parfeval in Matlab R2019b and if yes how?
- Parallel testing with cucumber, selenium and junit 5
- Parallel.ForEach vs ActionBlock
- Passing variable to foreach-object -parallel which is with in start-job
- dbatools SQL Functions Not Running In Parallel While SQL Server queries do in Powershell
- How do I run multiple instances of my Powershell function in parallel?
- Joblib.parallel vs concurrent.futures
Related Questions in MCLAPPLY
- How to get reliably a complete sessionInfo in mclapply/pbmclapply?
- Get a backtrace from an error in `mclapply`
- Difference between the working process of `mclapply` and `foreach()` loop
- Issue with `mclapply` in R package S4 implementation when passing strings
- How do I bootstrap correlations between two data frames?
- How can I optimize this for loop to run faster using lapply and parallelization?
- R: inconsistent random number generation in parallel simulation with mclapply
- R mclapply return serialization object size limit using version 4.3.0 on Apple M1 aarch64-apple-darwin20
- Is there any simple task that allows me to understand whether my embarassingly parallel program works fine?
- mclapply() instead of nested for loops
- Apply function in matrix elements of a list in R
- how to use lapply to get exactly same random number from mclapply
- Is there a way to prevent parallel::mclapply() accessing the contents of the global environment?
- Set OpenMP threads for all dependencies in R package
- how to make mclapply in Rscript maximize use of all available linux cores?
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?