Problem understanding audio stream number of samples when decoded with ffmpeg

391 Views Asked by At

The two streams I am decoding are an audio stream (adts AAC, 1 channel, 44100, 8-bit, 128bps) and a video stream (H264) which are received in an Mpeg-Ts stream, but I noticed something that doesn't make sense to me when I decode the AAC audio frames and try to line up the audio/video stream timestamps. I'm decoding the PTS for each video and audio frame, however I only get a PTS in the audio stream every 7 frames.

When I decode a single audio frame I get back 1024 samples, always. The frame rate is 30fps, so I see 30 frames each with 1024 samples which comes equals 30,720 samples and not the expected 44,100 samples. This is a problem when computing the timeline as the timestamps on the frames are slightly different between the audio and video streams. It's very close, but since I compute the timestamps via (1024 samples * 1,000 / 44,100 * 10,000 ticks) it's never going to line up exactly with the 30fps video.

Am I doing something wrong here with decoding the ffmpeg audio frames, or misunderstanding audio samples? And in my particular application, these timestamps are critical as I am trying to line up LTC timestamps which are decoded at the audio frame level, and lining those up with video frames.

FFProbe.exe:

Video:
r_frame_rate=30/1      
avg_frame_rate=30/1    
codec_time_base=1/60
time_base=1/90000      
start_pts=7560698279   
start_time=84007.758656

Audio:
r_frame_rate=0/0
avg_frame_rate=0/0
codec_time_base=1/44100
time_base=1/90000
start_pts=7560686278
start_time=84007.625311
0

There are 0 best solutions below