How to use a custom compute shaders using metal and get very smooth performance?

1.8k Views Asked by At

Im trying to apply the live camera filters through metal using the default MPSKernal filters given by apple and custom compute Shaders.

In compute shader I did the inplace encoding with the MPSImageGaussianBlur and the code goes here

func encode(to commandBuffer: MTLCommandBuffer, sourceTexture: MTLTexture, destinationTexture: MTLTexture, cropRect: MTLRegion = MTLRegion.init(), offset : CGPoint) {

    let blur = MPSImageGaussianBlur(device: device, sigma: 0)
    blur.clipRect = cropRect
    blur.offset = MPSOffset(x: Int(offset.x), y: Int(offset.y), z: 0)

    let threadsPerThreadgroup = MTLSizeMake(4, 4, 1)
    let threadgroupsPerGrid = MTLSizeMake(sourceTexture.width / threadsPerThreadgroup.width, sourceTexture.height / threadsPerThreadgroup.height, 1)

    let commandEncoder = commandBuffer.makeComputeCommandEncoder()
    commandEncoder.setComputePipelineState(pipelineState!)
    commandEncoder.setTexture(sourceTexture, at: 0)
    commandEncoder.setTexture(destinationTexture, at: 1)

    commandEncoder.dispatchThreadgroups(threadgroupsPerGrid, threadsPerThreadgroup: threadsPerThreadgroup)

    commandEncoder.endEncoding()

    autoreleasepool {
        var inPlaceTexture = destinationTexture
        blur.encode(commandBuffer: commandBuffer, inPlaceTexture: &inPlaceTexture, fallbackCopyAllocator: nil)
    }
}

But sometimes the inplace texture tend to fail and eventually it creates a jerk effect on the screen.

So if anyone can suggest me the solution without using the inplace texture or how to use the fallbackCopyAllocator or using the compute shaders in a different way that would be really helpful.

2

There are 2 best solutions below

1
On

I have done enough coding in this area (applying computing shaders to video stream from camera), and the most common problem you run into is the "pixel buffer reuse" issue.

The metal texture you create from the sample buffer is backed up a pixel buffer, which is managed by the video session, and can be re-used for following video frames, unless you retain the reference to the sample buffer (retaining the reference to the metal texture is not enough).

Feel free to take a look at my code at https://github.com/snakajima/vs-metal, which applies various computing shaders to a live video stream.

VSContext:set() method takes optional sampleBuffer parameter in addition to the texture parameter, and retain the reference to the sampleBuffer until the computing shader's computation is completed (in VSRuntime:encode() method).

0
On

The in place operation method can be hit or miss depending on what the underlying filter is doing. If it is a single pass filter for some parameters, then you'll end up running out of place for those cases.

Since that method was added, MPS has added an underlying MTLHeap to manage memory a bit more transparently for you. If your MPSImage doesn't need to be viewed by the CPU and exists for only a short period of time on the GPU, it is recommended that you just use a MPSTemporaryImage instead. When the readCount hits 0 on that, the backing store will be recycled through the MPS heap and made available for other MPSTemporaryImages and other temporary resources used downstream. Likewise, the backing store for it isn't actually allocated from the heap until absolutely necessary (e.g. texture is written to, or .texture is called) A separate heap is allocated for each command buffer.

Using temporary images should help reduce memory usage quite a lot. For example, in an Inception v3 neural network graph, which has over a hundred passes, the heap was able to automatically reduce the graph to just four allocations.