In Metal, how to wait for all GPU operations in my process to complete?

45 Views Asked by At

I'm writing an image-processing plugin (shared lib) for an application on MacOS. The app hands me a Metal buffer I'm supposed to process. I copy that buffer into a texture using a blitEncoder, and then run a kernel that accesses that texture. The problem is that sometimes I get bad texture data from that buffer. If I wait before blitting it to the texture (sleep for 30 msec, for instance) then it's 100% reliable but the sleep slows things down a lot. Is there any way to wait for the buffer to be "ready" (i.e. any pending operations on it are complete)? Unfortunately I have no access to the host app's closed-source code so I can't get the command buffers it's using to process that buffer, so no way to add fences or syncs or completion handlers. I just have the buffer itself. The buffer is in shared mode, which should be good:

MTLBuffer(len=134217728, contents=0x43e078000, '<AGXG13XFamilyBuffer: 0x2a0a9bf10>
    label = GF Metal Memory Pool Buffer 
    length = 134217728 
    cpuCacheMode = MTLCPUCacheModeDefaultCache 
    storageMode = MTLStorageModeShared 
    hazardTrackingMode = MTLHazardTrackingModeTracked 
    resourceOptions = MTLResourceCPUCacheModeDefaultCache MTLResourceStorageModeShared MTLResourceHazardTrackingModeTracked  
    purgeableState = MTLPurgeableStateNonVolatile')

I've looked into fences, events and anything I can but nothing seems to help other than just waiting. I tried calling blitEncoder->synchronizeResource(buffer); but since the buffer isn't in managed mode, I think that does nothing. So is there any way to wait for it to be ready, or add a global (per-process) GPU fence that will wait for all GPU operations to complete, or similar?

Here's my code. Note that the buffer is passed in to me from the host. Also I'm using the C++ MTL interface, but that's a shallow wrapper around the usual ObjC++.

  // Create encoder for blitting to the GPU texture
  MTL::BlitCommandEncoder* blitEncoder = commandBuffer->blitCommandEncoder();
  // This fixes it, at a high cost in performance:
  // std::this_thread::sleep_for(std::chrono::milliseconds(30));

  const MTL::Origin origin = {0, 0, 0};
  const MTL::Size size = {width, height, 1};  // Match the texture's dimensions
  blitEncoder->copyFromBuffer(buffer, 0,  // buffer, source offset
                              rowbytes,   // source bytes/row
                              0,          // source bytes/image (0 for 2d)
                              size,       // sourceSize,
                              texture,    // dest texture
                              0,          // slice
                              0,          // level
                              origin      // dest origin
  );

  blitEncoder->endEncoding();
1

There are 1 best solutions below

1
Spo1ler On

There is no such method and it would be a bad idea to do so anyway.

You need to think about the user application in terms of it's timelines. There's GPU timeline and there's CPU timeline. Changes to the MTLTextures happen almost exclusively on GPU timeline. Since you can't wait for the timeline to stop and you can't hold it up, and the user application is doing stuff on GPU timeline that might change the texture, you need to cooperate with the application. That would mean that you need ordering on the GPU timeline.

There are a couple ways to do this and they are all a bit more involved than just passing the MTLTexture as an argument. First, you could take an MTLCommandBuffer from the user application as an argument. Second, you could also take an MTLSharedEvent and have an MTLCommandQueue internal to your library that you create MTLCommandBuffers from that use the MTLSharedEvent in some predetermined way that the user application knows about. Any of these methods would make the library user responsible for putting your external work on the application timeline.

To prove my point, let's take a look at how the libraries "external" to Metal work. For example, MPS is full of encodeToCommandBuffer:... methods such as this one, that takes a command buffer and a bunch of other arguments that it needs to complete the operation. Or let's take an encodeToCommandBuffer method. Although, instead of passing all the state needed as a bunch of arguments, it stores the state in the MTLFX*Scaler object.

There's no other proper way to do it if you just have an MTLTexture. To give you an example about the sleep. You have no idea what frame rate the application is running at and if it's gonna use the MTLTexture it passed you again in the next how many milliseconds. If it's running at 120 FPS and uses the MTLTexture every frame, sleeping for 30 ms you would skip at least three frames, maybe even four if your CPU processing takes some additional time. If it's running at 30 FPS, 30 ms might be too small to even see the results of the application running. It might work in some very limited capacity if the texture is "one and done", but that is not usually the case.

As to why waiting for all GPU work to stop is a bad idea, it's because your library is not the only thing running GPU work inside the user application. You can't just halt it to insert your own work, because you have no idea how those things are synchronized or where they come from or if there's some deadline on the GPU timeline they need to meet. Or maybe if the CPU is waiting on the GPU result. You just can't reason about this.