HLSL Compute Shader Race Condition

44 Views Asked by At

I have some experience with compute shaders in HLSL. Presently, I'm developing a tool for the Unity engine that does something like texture baking: it takes a low-poly mesh and casts rays from its surface to high-poly.

I've encountered a so-called race condition and don't know how to solve the problem yet.

Algorithm Description:

My shader receives an input 2D image _PositionMap, each pixel of which contains the ray origin coordinates on the low-poly surface. The goal of the shader is to raycast against the high poly surface and fill the buffer named RWStructuredBuffer _HitInfo. It is the same size as _PositionMap, and it stores the high poly surface data at a given pixel: distance from the low poly (depth, initialized with +infinity before the shader is executed), triangle index and barycentric coordinates.

Current implementation:

Threads run in parallel across all _PositionMap pixels. Each thread iterates over all the triangles and checks whether the ray hits it. And if the hit point is closer than what is already recorded in _HitInfo[pixelIndex], it is rewritten. So, there is no race condition. This is what the algorithm looks like now:

#pragma kernel Main

struct HitInfo
{
    float depth;
    float2 barycentric;
    uint triangleIndex;
};

// mesh buffers declarations
Texture2D _PositionMap;
RWStructuredBuffer<HitInfo> _HitInfo;
uint _Width;                            // position map width (total dispatches along X)
uint _Height;                           // position map height (total dispatches along Y)
uint _Triangles;                        // mesh triangle count

[numthreads(16, 16, 1)]
void Main(uint3 id : SV_DispatchThreadID)
{
    uint pixelIndex = id.x * _Width + id.y;

    // iterating over each triangle in the mesh
    for (uint i = 0; i < _Triangles; i++)
    {
        // performing raycast from _PositionMap[id.xy] against the triangle at index i
        HitInfo triangleHitInfo = ...
            
        // performimg depth test
        bool depthTest = triangleHitInfo.depth <= _HitInfo[pixelIndex].depth;
            
        if (depthTest)
        {
            // overwriting previously stored data
            _HitInfo[pixelIndex] = triangleHitInfo;
        }
    }
}

Updated implementation:

I think the performance may be improved if we parallelize the loop using 3rd dimension threads. Now each thread will only perform raycast once, checking a single corresponding triangle.

With this approach, concurrency for reading and writing _HitInfo[pixelIndex] arises. I tried to eliminate it by introducing an additional buffer RWBuffer _ZCounters. It is the same size as _PositionMap and is initialized with zeros before the shader is executed. This is what the updated algorithm looks like:

#pragma kernel Main

struct HitInfo
{
    float depth;
    float2 barycentric;
    uint triangleIndex;
};

// mesh buffers declarations
Texture2D _PositionMap;
RWStructuredBuffer<HitInfo> _HitInfo;
RWBuffer<uint> _ZCounters;
uint _Width;                            // position map width (total dispatches along X)
uint _Height;                           // position map height (total dispatches along Y)
uint _Triangles;                        // mesh triangle count (total dispatches along Z)

[numthreads(8, 8, 8)]
void Main(uint3 id : SV_DispatchThreadID)
{
    uint pixelIndex = id.x * _Width + id.y;

    // performing a raycast from _PositionMap[id.xy] against the triangle at index id.z
    HitInfo triangleHitInfo = ...

    // waiting for our turn to access resources
    while (true)
    {
        if (_ZCounters[pixelIndex] == id.z)
        {            
            // CONCURRENCY SAFE AREA BEGINS
            
            // performing a depth test
            bool depthTest = triangleHitInfo.depth <= _HitInfo[pixelIndex].depth;
            
            if (depthTest)
            {
                // overwriting the previously written data
                _HitInfo[pixelIndex] = triangleHitInfo;
            }

            // allowing the next thread to enter this condition block
            _ZCounters[pixelIndex]++;

            // CONCURRENCY SAFE AREA ENDS
            
            break;
        }
    }
}

Here I use an infinite loop, inside which all threads constantly read the counter buffer value at the index of their pixel. Only one thread can be inside the if block at a time, and until it changes the counter value, no one should prevent it from reading and writing _HitInfo[pixelIndex]. Therefore, I concluded that data access in this area occurs in an orderly way.

Problem:

Unfortunately, my conclusion was incorrect and this approach did not help me get rid of the race condition. The _HitInfo buffer is filled randomly each time. I just can't figure out where I made a mistake. I can only say that it is unlikely that the shader stops working due to a timeout, because I was able to freeze the computer with endless loops.

I hope they can help me solve this problem.

EDIT

I've changed the location of the break statement a bit and it solved the problem. Look what happened:

enter image description here

Previously, each thread exited the loop immediately after incrementing the counter:

while (true)
{
    if (_ZCounters[xyIndex] == id.z)
    {
        if (triangleHitInfo.depth <= _HitInfo[xyIndex].depth)
        {
            _HitInfo[xyIndex] = triangleHitInfo;
        }
        
        _ZCounters[xyIndex]++;
        
        break;
    }
}

Now, each thread continues hanging in the loop. When the counter has finished iterating over all the triangles, all threads exit the loop simultaneously:

while (true)
{
    if (_ZCounters[xyIndex] == id.z)
    {
        if (triangleHitInfo.depth <= _HitInfo[xyIndex].depth)
        {
            _HitInfo[xyIndex] = triangleHitInfo;
        }
        
        _ZCounters[xyIndex]++;
    }

    if (_ZCounters[xyIndex] == _Triangles)
    {
        break;
    }
}

I'd still be deeply grateful if anyone could point out what was causing this behavior.

0

There are 0 best solutions below