Do threads in a warp execute instructions in sync? What about threads in a block?

Question

Do threads in a warp execute instructions in sync? What about threads in a block?

418 Views Asked by Pippi At 06 June 2025 at 15:31

I have a CUDA kernel in which each address of a global array is written to 4 times by each thread. It seems that two threads in a warp can potentially write to the same address at the same time, causing the process to be non-coalesced. What about threads in a block?

    d_next_front[i*width + j+1] = 1;
    d_next[i*width + j-1] = 1; 
    d_next[(i+1)*width + j] = 1;
    d_next[(i-1)*width + j] = 1;

Original Q&A

There are 1 best solutions below

**Robert Crovella** · Answer 1

Do threads in a warp execute instructions in sync?

Yes: Warps execute instructions in sync.

What about threads in a block?

No: A block is composed of one or more warps. The warps in a block are not necessarily synchronized, and usually won't be.

Coalescing refers to the memory transactions associated with a single instruction, executed by a single warp. (There is no concept of coalescing that applies across multiple warps.) If all of these memory transactions lie within a single naturally aligned 128-byte segment of global memory, then the transactions will "coalesce" within the memory controller, and be satisfied by a single transaction to memory. There are many nuances to this behavior, but the statement is generally instructive for cc2.0 and newer devices. There may be some additional caveats to consider for pre-cc2.0 devices.

It's impossible to look at your code and determine if the resultant transactions will coalesce. It would be necessary to know the definitions of i and j, especially as they relate to the built-in thread ID variables such as threadIdx.x Likewise, understanding whether two threads in a warp could write to the same location at the same time would require looking at much more code than what you have shown now.

Do threads in a warp execute instructions in sync? What about threads in a block?

There are 1 best solutions below

Related Questions in CUDA

Related Questions in GPU

Related Questions in PYCUDA

Trending Questions

Popular # Hahtags

Popular Questions