Decode h264 video bytes into JPEG frames in memory with ffmpeg

185 Views Asked by At

I'm using python and ffmpeg (4.4.2) to generate a h264 video stream from images produced continuously from a process. I am aiming to send this stream over websocket connection and decode it to individual image frames at the receiving end, and emulate a stream by continuously pushing frames to an <img> tag in my HTML.

However, I cannot read images at the receiving end, after trying combinations of rawvideo input format, image2pipe format, re-encoding the incoming stream with mjpeg and png, etc. So I would be happy to know what the standard way of doing something like this would be.

At the source, I'm piping frames from a while loop into ffmpeg to assemble a h264 encoded video. My command is:

        command = [
            'ffmpeg',
            '-f', 'rawvideo',
            '-pix_fmt', 'rgb24',
            '-s', f'{shape[1]}x{shape[0]}',
            '-re',
            '-i', 'pipe:',
            '-vcodec', 'h264',
            '-f', 'rawvideo',
            # '-vsync', 'vfr',
            '-hide_banner',
            '-loglevel', 'error',
            'pipe:'
        ]

At the receiving end of the websocket connection, I can store the images in storage by including:

        command = [
            'ffmpeg',
            '-i', '-',  # Read from stdin
            '-c:v', 'mjpeg',
            '-f', 'image2',
            '-hide_banner',
            '-loglevel', 'error',
            f'encoded/img_%d_encoded.jpg'
        ]

in my ffmpeg command.

But, I want to instead extract each individual frame coming in the pipe and load in my application, without saving them in storage. So basically, I want whatever is happening at by the 'encoded/img_%d_encoded.jpg' line in ffmpeg, but allowing me to access each frame in the stdout subprocess pipe of an ffmpeg pipeline at the receiving end, running in its own thread.

  • What would be the most appropriate ffmpeg command to fulfil a use case like the above? And how could it be tuned to be faster or have more quality?
  • Would I be able to read from the stdout buffer with process.stdout.read(2560x1440x3) for each frame?

If you feel strongly about referring me to a more update version of ffmpeg, please do so.

PS: It is understandable this may not be the optimal way to create a stream. Nevertheless, I do not find there should be much complexity in this and the latency should be low. I could instead communicate JPEG images via the websocket and view them in my <img> tag, but I want to save on bandwidth and relay some computational effort at the receiving end.

1

There are 1 best solutions below

2
Christoph On

Firstly, I suggest using the "av" library from Python, which is an excellent wrapper around ffmpeg. It offers greater control, eliminating the need for subprocesses. Additionally, it provides more robust error handling options.

Server

Create a WebSocket server to manage your H.264 stream, which I presume is an RTSP Stream. Utilize av.open with container.demux to obtain an av.packet that you can then transmit to your client via WebSocket. This packet would be an H.264 packet.

Client

Connect to the server via WebSocket and receive the H.264 packet. Establish an av.codecContext for H.264, where you feed in the packets and decode them. After decoding, you can convert frames to images using frame.to_image(), or even better, create a codec context for MJPEG/JPEG and encode the frame to JPEG. Then, you can manage the JPEG as needed.

The question arises as to why this process is not conducted directly on the client. I understand the intention is to save bandwidth, but bandwidth savings are only realized if you are transmitting H.264 packets. Thus, the server's role would primarily be to parse the stream, which is not as resource-intensive as decoding. The client would, in any case, undertake the more demanding tasks. If you are aiming for the best latency, you should consider avoiding an unnecessary extra step that might not be needed.

Its a choice that depense of the use case. If u would build a streaming server that broadcast one source two diffrence clients this would be make sense. If low latency requiered you should look If your camera has MJPEG/HTTP that would be the best because u dont need to transcode, of course it has bandwith downside.

The choice depends on the use case. If you are developing a streaming server that broadcasts a single source to different clients, this approach would make sense. If low latency is required, you should check if your camera supports MJPEG/HTTP, as this would be optimal because it eliminates the need for transcoding, despite the bandwidth downside.

Also, reevaluating your approach could be beneficial. For instance, using WebRTC might be more complex due to the necessity of a signaling server, but it allows the browser to work directly with H264 without needing transcoding. There's an impressive Go project called "RTSPtoWebRTC" that handles all of this for you.

Furthermore, you could utilize fMP4, where H.264 is encapsulated in a fragmented MP4 stream, thereby also removing the need for transcoding, but its more complex on JS Side.