Is MPI_ACCUMULATE with MPI_REPLACE always a better option than MPI_PUT

112 Views Asked by Yash At 05 March 2023 at 16:29

I was going through the accumulate and atomic MPI RMA calls which are introduced in MPI-3. After reading I found out that there is a MPI_REPLACE operator which can be used in MPI_Accumulate to perform a similar functionality as that of MPI_PUT. And from what I understood after reading that concurrent MPI_ACCUMULATE calls are not erroneous unilke concurrent MPI_PUT calls. Hence in my application whenever I want to update data I am using EXCLUSIVE_LOCK for MPI_PUT. But this causes severe performance degradation as even updates to different memory locations on target process happen sequentially. Hence as SHARED_LOCK is valid with MPI_ACCUMULATE is using MPI_ACCUMULATE with MPI_REPLACE inside a SHARED_LOCK always a better alternative than using MPI_PUT with a EXCLUSIVE_LOCK? Or am I misunderstanding something? Also simillary on a minor note is MPI_GET_ACCUMULATE with MPI_NO_OP always better than MPI_GET?

So basically my question is will removing all MPI_PUT calls which are currently synced by EXLUSIVE LOCK and replacing those with a MPI_ACCUMULATE with MPI_REPLACE synced by SHARED_LOCK a valid and better alternative ... as it removes the need for getting an EXCLUSIVE LOCK on the whole target process window.

Original Q&A

There are 1 best solutions below

Jeff Hammond On 04 April 2023 at 09:53

MPI_ACCUMULATE with MPI_REPLACE is an atomic put and is neither better nor worse in general but they are almost certainly better than MPI_PUT using exclusive locks when one requires element-wise atomicity.

The recommended model for MPI-3 RMA is to use MPI_WIN_LOCK_ALL for the lifetime of the window, and use element-wise RMA operations or some form of mutual exclusive (mentioned in https://stackoverflow.com/a/75927929/2189128) for anything else.

Use MPI_WIN_FLUSH(_LOCAL)(_ALL) to achieve the appropriate synchronization without terminating the epoch. Use _LOCAL versions if you only care about reusing the buffer, or if the RMA operation has round-trip semantics (e.g. get, get_accumulate, fetch_and_op, compare_and_swap). Use _ALL versions to complete at all targets in the window, as opposed to just one.

Is MPI_ACCUMULATE with MPI_REPLACE always a better option than MPI_PUT

There are 1 best solutions below

Related Questions in PARALLEL-PROCESSING

Related Questions in MPI

Related Questions in ATOMIC

Related Questions in DISTRIBUTED-SYSTEM

Related Questions in MPI-RMA

Trending Questions

Popular # Hahtags

Popular Questions