Recently, I came across Chapel. I liked the examples given in the tutorials but many of them were embarrassingly parallel in my eyes. I'm working on Scattering Problems in Many-Body Quantum Physics and a common problem can be reduced to the following.
- A tensor
Aof a shapeM x N x Nis filled with the solution of a Matrix equation forMdifferent parameters1..M - A subset of the Tensor
Ais needed to compute a correction term for each of the parameters1..M.
The first part of the Problem is embarrassingly parallel.
My question is thus if and how it is possible to transfer only the needed subset of the tensor A to each of the locales of a cluster and minimize the necessary communication?
When Chapel is doing its job right, transfers of array slices between distributed and local arrays (say) should be performed in an efficient manner. This means that you should be able to write such tensor-subset transfers using Chapel's array slicing notation.
For example, here's one way to write such a pattern:
The new variable
myLocalArraywill be an array whose elements are copies of the ones inmyDistArrayas described by the indices inSlice. The domain ofmyLocalArraywill be the slicing domainSlice, so sinceSliceis a non-distributed domain,myLocalArraywill also be a local / non-distributed array, and therefore won't incur any of the overheads of using Chapel's distributed array notation when it's operated on from the current locale.To date, we have focused principally on optimizing such transfers for Block-distributed arrays. For example, for cases like the above example, when myDistArray is Block-distributed, I'm seeing a fixed number of communications between the locales as I vary the size of the slice (though the size of those communications would obviously vary depending on the number of elements that need to be transferred). Other cases and patterns are known to need more optimization work, so if you find a case that isn't performing / scaling as you'd expect, please file a Chapel GitHub issue against it to help alert us to your need and/or help you find a workaround.
So, sketching out the pattern you describe, I might imagine doing something like:
Some other things that seem related to me, but which I don't want to bog this question down with are:
(So feel free to ask follow-up questions if these are of interest)
Finally, for the sake of posterity, here's the program I wrote up while I was putting this response together to make sure I'd get the behavior I expected in terms of numbers of communications and getting a local array (this was with
chpl version 1.23.0 pre-release (ad097333b1), though I'd expect the same behavior for recent releases of Chapel: