Issue creating MTLBuffer from MTLTexture used as inputs in CoreML Custom Layer for GPU execution

I am trying to create a CoreML Custom layer that runs on the GPU, using Objective-C for CoreML setup and Metal for GPU programming.

I have created the CoreML model with the custom layer and can successfully execute on the GPU, I wish to create an MTLBuffer from an input MTLTexture in my setup actual GPU execution, although I can't seem to do so, or get access to the memory address to the MTLTexture memory.

When defining a custom layer in CoreML to run on the GPU, the following function needs to be defined, with the given prototype;

(BOOL) encodeToCommandBuffer:(id<MTLCommandBuffer>)commandBuffer inputs:(NSArray<id<MTLTexture>> *)inputs outputs:(NSArray<id<MTLTexture>> *)outputs error:(NSError *__autoreleasing  _Nullable *)error{

    // GPU Setup, moving data, encoding, execution and so on here


Here, the inputs are passed as an NSArray of MTLTexture's, we then pass these texture's on to the Metal Shader for computation. My problem is that I want to pass an MTLBuffer to the Metal Shader, which points to the input data, say inputs[0], but I am having troubling copying the input MTLTexture to an MTLBuffer.

I have tried using the MTLBlitCommandEncoder to copy the data from the MTLTexture to an MTLBuffer like so;

id<MTLBuffer> test_buffer = [command_PSO.device newBufferWithLength:(8) options:MTLResourceStorageModeShared];
id <MTLBlitCommandEncoder> blitCommandEncoder = [commandBuffer blitCommandEncoder];
[blitCommandEncoder copyFromTexture:inputs[0]
                           sourceOrigin:MTLOriginMake(0, 0, 0)
                             sourceSize:MTLSizeMake(1, 1, 1)
[blitCommandEncoder endEncoding];

The above example should copy a single pixel from the MTLTexture, inputs[0], to the MTLBuffer, test_buffer, but this is not the case.

MTLTextures, getBytes also doesn't work as the inputs have MTLResourceStorageModePrivate set.

When I inspect the input MTLTexture I note that the attribute buffer = <null> and I'm wondering if this could be an issue since the texture was not created from a buffer, and perhaps doesn't store the address to memory easily, but surely we should be able to get the memory address somewhere?

For further reference, here is the input MTLTexture definition;

<CaptureMTLTexture: 0x282469500> -> <AGXA14FamilyTexture: 0x133d9bb00>
    label = <none> 
    textureType = MTLTextureType2DArray 
    pixelFormat = MTLPixelFormatRGBA16Float 
    width = 8 
    height = 1 
    depth = 1 
    arrayLength = 1 
    mipmapLevelCount = 1 
    sampleCount = 1 
    cpuCacheMode = MTLCPUCacheModeDefaultCache 
    storageMode = MTLStorageModePrivate 
    hazardTrackingMode = MTLHazardTrackingModeTracked 
    resourceOptions = MTLResourceCPUCacheModeDefaultCache MTLResourceStorageModePrivate MTLResourceHazardTrackingModeTracked  
    usage = MTLTextureUsageShaderRead MTLTextureUsageShaderWrite 
    shareable = 0 
    framebufferOnly = 0 
    purgeableState = MTLPurgeableStateNonVolatile 
    swizzle = [MTLTextureSwizzleRed, MTLTextureSwizzleGreen, MTLTextureSwizzleBlue, MTLTextureSwizzleAlpha] 
    isCompressed = 0 
    parentTexture = <null> 
    parentRelativeLevel = 0 
    parentRelativeSlice = 0 
    buffer = <null> 
    bufferOffset = 0 
    bufferBytesPerRow = 0 
    iosurface = 0x0 
    iosurfacePlane = 0 
    allowGPUOptimizedContents = YES
    label = <none>

In your snippet, destinationBytesPerRow and destinationBytesPerImage are calculated incorrectly. Referring to documentation

destinationBytesPerRow The stride in bytes between rows of the source texture memory. The value must be a multiple of the source texture's pixel size, in bytes. The value must be less than or equal to 32,767 multiplied by the source texture’s pixel size.

destinationBytesPerImage For 3D textures and 2D array textures, the stride in bytes between 2D images of the source buffer memory. The value must be a multiple of the source texture's pixel size, in bytes.

Your texture is 8 pixels wide and has .rgba16Float format, so destinationBytesPerRow should be 8 * 8. Same for destinationBytesPerImage, it seems like it should be 64 in your case.