The documentation mentions, "When parallelizing nested for loops, there is always a question of which loop to parallelize. The standard advice is to parallelize the outer loop."
However, I did not see any specific examples of parallelizing the outer loops and then sequentially executing the inner loops. In such a case, would it be appropriate to use foreach+%dopar% for the outer loops and a regular for loop for the inner loops?
Furthermore:
x <-
foreach(b=bvec, .combine='cbind') %:%
foreach(a=avec, .combine='c') %dopar% {
sim(a, b)
}
This code block uses nested foreach, so does it mean that both the inner loops and outer loops are parallelized? This is a bit confusing to me.
I have two interpretations:
The task of "taking a value from bvec, such as 1, and iterating over all values in avec" is divided among multiple threads to be executed simultaneously. Then, the task of "taking 2 from bvec and iterating over all values in avec" is also divided among multiple threads to be executed simultaneously...
Are multiple threads executing sim(a, b) concurrently? If this is the case, wouldn't there be potential for confusion when collecting the results? (Although it seems that the order does not matter here.)
Which interpretation is correct?
Q1.
%:%essentially transform multiple loops into a single one, like first calculating the full combination ofbvecandavec, and then loop at one go. It is like:This is different from the standard loop like below
In this case, the parallel is only happening in the inner loop, and outer loop must wait each value in
bvecto finish and get combined before moving into the next value inbvec. Therefore, the larger-number iterator should be in the parallel processing (hereavecfor the inner loop), so the effeciency will be good. Otherwise, an extreme case can be ifaveconly have 1 value, and whole process is sequential. Now you can think about this question you haveIt is really dependent on the size of the iterator for each layer of loop rather than simply outer or inner. This is why
%:%is useful here by converting nested loops into a single flat one irrespectively, so you don't need to think about the iterator size and layers.Q2. Yes, all multiple threads executing
sim(a, b)concurrently. To store the result,foreachwill first define alistwith its length equal to the sum number of (nested) loops, so the places for the results from all loops are reserved. Then the results can go into the right place of the list at any time, and there won't be any