How to add an alpha channel very fast to a RGB image using SSE2 and c++

153 Views Asked by At

I am writing a YUV420p to RGBA color conversion algorithm in C++ using SSE2. Right now, I have YUV420p to RGB and RGB to RGBA. The results are as follows:

size of image: 1920 x 1200
time of RGBA to YUV conversion: 0.0029011
time of YUV to RGB conversion: 0.0044585
time of RGB to RGBA conversion (approach 1): 0.0064747
time of RGB to RGBA conversion (approach 2): 0.0066194
time of RGB to RGBA conversion (approach 3): 0.0069835

As you can see, the RGB to RGBA conversion takes longer than the YUV420p to RGB or RGBA to YUV420p. I'm having a lot of trouble interleaving the alpha channel in the YUV420p to RGB calculation, so I am trying a post-processing step (RGB to RGBA). Here is the code so far:

Approach 1:

void convertRGB24itoRGBA32i ( int width, int height, const unsigned char *RGB, unsigned char *RGBA ) {
    const size_t numPixels = (width - 1) * (height - 1);

    for ( size_t i = 0; i < numPixels; i++ ) 
    {
        __m128i sourcePixel = _mm_loadu_si128 ( (__m128i*)&RGB[i * 3] );
        //__m128i alphaChannel = _mm_setzero_si128 ( ); // Set alpha to 0 (transparent)
        __m128i alphaChannel = _mm_set1_epi32 ( 0xFF000000 );
        __m128i rgb32Pixel = _mm_or_si128 ( alphaChannel, sourcePixel );
        _mm_storeu_si128 ( (__m128i*)&RGBA[i * 4], rgb32Pixel );
    }
}

Approach 2:

void convertRGB24itoRGBA32i ( int width, int height, const RT_UByte *RGB, RT_UByte *RGBA )
{
    const size_t numPixels = (width - 1) * (height - 1);

    // Create the shuffle control mask for converting BGR to RGBA
    __m128i shuffleMask = _mm_setr_epi8 ( 2, 1, 0, 3, 5, 4, 3, 7, 8, 11, 10, 9, 13, 12, 15, 14 );

    for ( size_t i = 0; i < numPixels; i++ ) {
        __m128i sourcePixel = _mm_loadu_si128 ( reinterpret_cast<const __m128i*>(&RGB[i * 3]) );

        __m128i rgbaPixel = _mm_shuffle_epi8 ( sourcePixel, shuffleMask );

        __m128i alphaChannel = _mm_set1_epi32 ( 0xFF000000 );

        // Merge the RGBA channels
        rgbaPixel = _mm_or_si128 ( alphaChannel, rgbaPixel );

        _mm_storeu_si128 ( reinterpret_cast<__m128i*>(&RGBA[i * 4]), rgbaPixel );
    }
}

Approach 3:

inline void convertBGRi24toBGRAi32 ( const ubyte3 *bgri24, ubyte4* bgrai32, t_size size )
{
    for ( ; size != 0; --size, ++bgrai32, ++bgri24 )
    {
        bgrai32->x = bgri24->x;
        bgrai32->y = bgri24->y;
        bgrai32->z = bgri24->z;
        bgrai32->w = 0xff;
    };

}
0

There are 0 best solutions below