H264 Decoding with Apple Video Toolkit

675 Views Asked by At

I am trying to get an H264 streaming app working on various platforms using a combination of Apple Video Toolbox and OpenH264. There is one use-case that doesn't work and I can't find any solution. When the source uses video Toolbox on a 2011 iMac running MacOS High Sierra and the receiver is a MacBook pro running Big Sur.

On the receiver the decoded image is about 3/4 green. If I scale the image down to about 1/8 of original before encoding then it works fine. If I capture the frames on the MacBook and then run exactly the same decoding software in a test program on the iMac then it decodes fine. Doing the same on the Macbook (same image of test program) give 3/4 green again. I have a similar problem when receiving from an OpenH264 encoder on a slower Windows machine. I suspect that this has something to do with temporal processing, but really don't understand H264 well enough to work it out. One thing that I did notice is that the decode call returns with no error code but a NULL pixel buffer about 70% of the time.

The "guts" of the decoding part looks like this (modified from a demo on GitHub)

void didDecompress(void *decompressionOutputRefCon, void *sourceFrameRefCon, OSStatus status, VTDecodeInfoFlags infoFlags, CVImageBufferRef pixelBuffer, CMTime presentationTimeStamp, CMTime presentationDuration )
{
    CVPixelBufferRef *outputPixelBuffer = (CVPixelBufferRef *)sourceFrameRefCon;
    *outputPixelBuffer = CVPixelBufferRetain(pixelBuffer);
}

 void initVideoDecodeToolBox ()
    {
        if (!decodeSession)
        {
            const uint8_t* parameterSetPointers[2] = { mSPS, mPPS };
            const size_t parameterSetSizes[2] = { mSPSSize, mPPSSize };
            OSStatus status = CMVideoFormatDescriptionCreateFromH264ParameterSets(kCFAllocatorDefault,2, //param count
                                                                                  parameterSetPointers,
                                                                                  parameterSetSizes,
                                                                                  4, //nal start code size
                                                                                  &formatDescription);
            if(status == noErr)
            {
                CFDictionaryRef attrs = NULL;
                const void *keys[] = { kCVPixelBufferPixelFormatTypeKey, kVTDecompressionPropertyKey_RealTime };
                uint32_t v = kCVPixelFormatType_32BGRA;
                const void *values[] = { CFNumberCreate(NULL, kCFNumberSInt32Type, &v), kCFBooleanTrue };
                attrs = CFDictionaryCreate(NULL, keys, values, 2, NULL, NULL);
                VTDecompressionOutputCallbackRecord callBackRecord;
                callBackRecord.decompressionOutputCallback = didDecompress;
                callBackRecord.decompressionOutputRefCon = NULL;
                status = VTDecompressionSessionCreate(kCFAllocatorDefault, formatDescription, NULL, attrs, &callBackRecord, &decodeSession);
                CFRelease(attrs);
            }
            else
            {
                NSLog(@"IOS8VT: reset decoder session failed status=%d", status);
            }
        }
    }

CVPixelBufferRef decode ( const char *NALBuffer, size_t NALSize )
    {
        CVPixelBufferRef outputPixelBuffer = NULL;
        if (decodeSession && formatDescription )
        {
            // The NAL buffer has been stripped of the NAL length data, so this has to be put back in
            MemoryBlock buf ( NALSize + 4);
            memcpy ( (char*)buf.getData()+4, NALBuffer, NALSize );
            *((uint32*)buf.getData()) = CFSwapInt32HostToBig ((uint32)NALSize);
            
            CMBlockBufferRef blockBuffer = NULL;
            OSStatus status  = CMBlockBufferCreateWithMemoryBlock(kCFAllocatorDefault, buf.getData(), NALSize+4,kCFAllocatorNull,NULL, 0, NALSize+4, 0, &blockBuffer);
            
            if(status == kCMBlockBufferNoErr)
            {
                CMSampleBufferRef sampleBuffer = NULL;
                const size_t sampleSizeArray[] = {NALSize + 4};
                status = CMSampleBufferCreateReady(kCFAllocatorDefault,blockBuffer,formatDescription,1, 0, NULL, 1, sampleSizeArray,&sampleBuffer);
                
                if (status == kCMBlockBufferNoErr && sampleBuffer)
                {
                    VTDecodeFrameFlags flags = 0;VTDecodeInfoFlags flagOut = 0;
                    
                    // The default is synchronous operation.
                    // Call didDecompress and call back after returning.
                    OSStatus decodeStatus = VTDecompressionSessionDecodeFrame ( decodeSession, sampleBuffer, flags, &outputPixelBuffer, &flagOut );

                    if(decodeStatus != noErr)
                    {
                        DBG ( "decode failed status=" + String ( decodeStatus) );
                    }
                    CFRelease(sampleBuffer);
                }
                CFRelease(blockBuffer);
            }
        }
        return outputPixelBuffer;
    }

Note: the NAL blocks don't have a 00 00 00 01 separator because they are streamed in blocks with explicit length field.

Decoding works fine on all platforms, and the encoded stream decodes fine with OpenH264.

1

There are 1 best solutions below

1
On

Well, I finally found the answer so I'm going to leave it here for posterity. It turns out that the Video Toolkit decode function expects the NAL blocks that all belong to the same frame to be copied into a single SampleBuffer. The older Mac is providing the app with single keyframes that are split into separate NAL blocks which the app then sends individually across the network. Unfortunately this means that the first NAL block will be processed, in may case less than a quarter of the picture, and the rest will be discarded. What you need to do is work out which NALs are part of the same frame, and bundle them together. Unfortunately this requires you to partially parse the PPS and the frames themselves, which is not trivial. Many thanks to the post here at the Apple site which put me on the right track.