CUDA kernel to convert P010LE into ARGB

Question

CUDA kernel to convert P010LE into ARGB

117 Views Asked by Robinson At 22 September 2023 at 21:14

I have frames extracted from a 10 bit movie, using ffmpeg writing P010LE frames. I'm loading the frames into a vector of uint8_t. I've verified the frame data by viewing it with rawpixels.net. I'm trying to make a CUDA kernel that will convert these P010LE frames into ARGB format. Apparently this is beyond my mental horsepower.

Anyway, to simplify I thought I'd just show the luma, then maybe ask a question about the chroma once I've failed that. My kernel looks like this:

__global__ void P010leToArgbKernel(const uint8_t * P010le, uint8_t * argb, int width, int height)
{
    int x = blockIdx.x * blockDim.x + threadIdx.x;
    int y = blockIdx.y * blockDim.y + threadIdx.y;

    if (x < width && y < height)
    {
        int yIndexP010 = (y * width + x) * 2;

        uint16_t Y = *reinterpret_cast<const uint16_t*>(P010le + yIndexP010) & 0x3FF;
        uint8_t Y8 = static_cast<uint8_t>((Y * 255 + 511) / 1023); // Proper rounding

        int yIndex = (y * width + x) * 4;

        argb[yIndex + 0] = Y8;
        argb[yIndex + 1] = Y8;
        argb[yIndex + 2] = Y8;
        argb[yIndex + 3] = 255; // Alpha
    }
}

The result is as follows, which I do not believe to be correct (original image left, converted image right). I'm expecting a smooth, greyscale image. What mistake did I make?

Original Q&A

There are 1 best solutions below

**Robinson** · Accepted Answer · 2023-09-23T13:15:49.107000

Following a hint from Christoph I realised the P010LE format actually stores its 10 bits in the high 10, not the low. Shifting down 6 times brings the data into the low 10 bits. Apparently it's good practice to mask off the bits you want in any case, so I have done that here.

Y = (Y >> 6) & 0x3FF;

The final kernel (probably not the fastest possible but it works) is below. When I used this with GDI+ so I could get a very quick-to-code visualisation, I also realised that its PixelFormat32bppARGB is actually BGRA on the underlying bitmap, so I've renamed it from P010le_To_Argb_Kernel to P010le_To_Bgra_Kernel.

__global__ void P010le_To_Bgra_Kernel(const uint8_t * P010le, uint8_t * argb, int width, int height)
{
    int x = blockIdx.x * blockDim.x + threadIdx.x;
    int y = blockIdx.y * blockDim.y + threadIdx.y;

    if (x < width && y < height)
    {
        int yIndexP010 = (y * width * 2) + (x * 2);

        uint16_t Y = *reinterpret_cast<const uint16_t*>(P010le + yIndexP010);
        
        // Important note:  The data is in the high ten bits, not the low!

        Y = (Y >> 6) & 0x3FF;
        
        int uvIndexP010 = (width * height * 2) + ((y / 2) * (width * 2)) + (x / 2) * 4;

        uint16_t U = *reinterpret_cast<const uint16_t *>(P010le + uvIndexP010) >> 6 & 0x3FF;
        uint16_t V = *reinterpret_cast<const uint16_t*>(P010le + uvIndexP010 + 2) >> 6 & 0x3FF;

        uint8_t Y8 = static_cast<uint8_t>((Y * 255 + 511) / 1023); // Scale to 8-bit.
        uint8_t U8 = static_cast<uint8_t>((U * 255 + 511) / 1023);
        uint8_t V8 = static_cast<uint8_t>((V * 255 + 511) / 1023);

        int C = Y8 - 16;
        int D = U8 - 128;
        int E = V8 - 128;

        int argbIndex = (y * width + x) * 4;

        // GDI+ bitmap format PixelFormat32bppARGB claims to be ARGB but in byte order it's BGRA.

        argb[argbIndex + 3] = 255; // Alpha

        argb[argbIndex + 2] = min(max((298 * C + 409 * E + 128) >> 8, 0), 255); // Red
        argb[argbIndex + 1] = min(max((298 * C - 100 * D - 208 * E + 128) >> 8, 0), 255); // Green
        argb[argbIndex + 0] = min(max((298 * C + 516 * D + 128) >> 8, 0), 255); // Blue
    }
}

CUDA kernel to convert P010LE into ARGB

There are 1 best solutions below

Related Questions in IMAGE-PROCESSING

Related Questions in CUDA

Related Questions in PIXELFORMAT

Trending Questions

Popular # Hahtags

Popular Questions