I have a custom neural net here, made mostly of the usual building block (conv, relu, max pool, etc).
But the last layer needs a sigmoid on some feature channels, and softmax on others (trying to predict classes, and a confidence index).
Anyway, I'm trying to build a Metal operation on the MPSImage.texture at the end of my convolutions.
Here is where it gets hairy. I think each conv_img.texture
array entry contains 4 layers of my neural net (one per each channel of rgba
)
If I have my custom metal thread code, I'll be getting one value that corresponds to 4 layers per grid entry.
Here is my sigmoid example, which is supposed to take grid values at the desired layer, and store the output value in some other layer.
I don't think it would work for the reason mentioned above.
I'd love some help, or some example of custom layers implemented after some Apple provided MPSConvolution
.
kernel void sigmoid(texture2d_array<float, access::read> inTexture [[texture(0)]],
texture2d_array<float, access::write> outTexture [[texture(1)]],
uint3 gid [[thread_position_in_grid]]) {
if (gid.x >= outTexture.get_width()
|| gid.y >= outTexture.get_height()
|| gid.z >= outTexture.get_array_size()) {
return;
}
const uint classes = 80;
const uint coords = 4;
const uint size_pred = (classes + coords + 1);
uint layer = size_pred * gid.z + 4;
float4 x = inTexture.read(uint2(gid.x, gid.y), layer);
float4 y = float4(1.0 / (1.0 + exp(-x)));
outTexture.write(y, uint2(gid.x, gid.y), gid.z);
}
EDIT: this is how I've been trying to debug it
In the final buffer with all the values, I should see something
that looks like my thread grid originally.
But it doesn't seem to work.
kernel void sigmoid(texture2d_array<float, access::read> inTexture [[texture(0)]],
texture2d_array<float, access::write> outTexture [[texture(1)]],
uint3 gid [[thread_position_in_grid]])
{
if (gid.x >= outTexture.get_width()
|| gid.y >= outTexture.get_height()
|| gid.z >= outTexture.get_array_size()) {
return;
}
const uint classes = 80;
const uint coords = 4;
const uint texture_channels = 4;
const uint size_pred = (classes + coords + 1) / texture_channels;
const uint layer = size_pred * gid.z + 4;
const uint offset = layer % texture_channels;
const float4 channels = inTexture.read(uint2(gid.x, gid.y), layer);
// const float r = x[0]
// const float g = x[1]
// const float b = x[2]
// const float a = x[3]
const float channel_of_interest = channels[offset];
// in the output buffer should see a bunch of numbers that look
// like the thread grid
const float y = gid.x * 10000 + gid.y * 10 + gid.z;
// float y = float(1.0 / (1.0 + exp(-channel_of_interest)));
const uint new_offset = gid.z % texture_channels;
float4 new_channels = 0;
new_channels[new_offset] = y;
outTexture.write(y, uint2(gid.x, gid.y), gid.z);
}