Implement a custom layer after a series of MPSCNNConvolution

214 Views Asked by At

I have a custom neural net here, made mostly of the usual building block (conv, relu, max pool, etc).

But the last layer needs a sigmoid on some feature channels, and softmax on others (trying to predict classes, and a confidence index).

Anyway, I'm trying to build a Metal operation on the MPSImage.texture at the end of my convolutions.

Here is where it gets hairy. I think each conv_img.texture array entry contains 4 layers of my neural net (one per each channel of rgba)

If I have my custom metal thread code, I'll be getting one value that corresponds to 4 layers per grid entry.

Here is my sigmoid example, which is supposed to take grid values at the desired layer, and store the output value in some other layer.

I don't think it would work for the reason mentioned above.

I'd love some help, or some example of custom layers implemented after some Apple provided MPSConvolution.

kernel void sigmoid(texture2d_array<float, access::read> inTexture [[texture(0)]],
                    texture2d_array<float, access::write> outTexture [[texture(1)]],
                    uint3 gid [[thread_position_in_grid]]) {

    if (gid.x >= outTexture.get_width()
        || gid.y >= outTexture.get_height()
        || gid.z >= outTexture.get_array_size()) {
        return;
    }

    const uint classes = 80;
    const uint coords = 4;
    const uint size_pred = (classes + coords + 1);

    uint layer = size_pred * gid.z + 4;
    float4 x = inTexture.read(uint2(gid.x, gid.y), layer);
    float4 y = float4(1.0 / (1.0 + exp(-x)));
    outTexture.write(y, uint2(gid.x, gid.y), gid.z);
}

EDIT: this is how I've been trying to debug it In the final buffer with all the values, I should see something that looks like my thread grid originally.
But it doesn't seem to work.

kernel void sigmoid(texture2d_array<float, access::read> inTexture [[texture(0)]],
                    texture2d_array<float, access::write> outTexture [[texture(1)]],
                    uint3 gid [[thread_position_in_grid]])
{
    if (gid.x >= outTexture.get_width()
        || gid.y >= outTexture.get_height()
        || gid.z >= outTexture.get_array_size()) {
        return;
    }

    const uint classes = 80;
    const uint coords = 4;
    const uint texture_channels = 4;
    const uint size_pred = (classes + coords + 1) / texture_channels;

    const uint layer = size_pred * gid.z + 4;
    const uint offset = layer % texture_channels;
    const float4 channels = inTexture.read(uint2(gid.x, gid.y), layer);
    // const float r = x[0]
    // const float g = x[1]
    // const float b = x[2]
    // const float a = x[3]
    const float channel_of_interest = channels[offset];

    // in the output buffer should see a bunch of numbers that look 
    // like the thread grid
    const float y = gid.x * 10000 + gid.y * 10 + gid.z;
    // float y = float(1.0 / (1.0 + exp(-channel_of_interest)));
    const uint new_offset = gid.z % texture_channels;
    float4 new_channels = 0;
    new_channels[new_offset] = y;
    outTexture.write(y, uint2(gid.x, gid.y), gid.z);
}
0

There are 0 best solutions below