Different PTS values in ffmpeg and MP4 CTS values obtained using ctts and stts

284 Views Asked by At

I have been studying PTS values in .mp4 media files. PTS for video stream can be extracted from ffmpeg CLI using


    ffmpeg -hide_banner -i  -vf "showinfo" -f null -

For a sample .mp4 I have downloaded from the internet shows the following output.


    Press [q] to stop, [?] for help
    [Parsed_showinfo_0 @ 0x3741c00] config in time_base: 1/30, frame_rate: 30/1
    [Parsed_showinfo_0 @ 0x3741c00] config out time_base: 0/0, frame_rate: 0/0
    [Parsed_showinfo_0 @ 0x3741c00] n:   0 pts:      0 pts_time:0       pos:    58852 fmt:yuv420p sar:1/1 s:1920x1080 i:P iskey:1 type:I checksum:49058BA3 plane_checksum:[E852D7DE 07E2B7D4 EA12FBD3] mean:[75 123 124] stdev:[52.7 4.8 11.7]
    [Parsed_showinfo_0 @ 0x3741c00]   side data - User Data Unregistered:
    [Parsed_showinfo_0 @ 0x3741c00] UUID=dc45e9bd-e6d9-48b7-962c-d820d923eeef
    [Parsed_showinfo_0 @ 0x3741c00] User Data=78323634202d20636f726520313535207231302062303062636166202d20482e3236342f4d5045472d342041564320636f646563202d20436f70796c65667420323030332d32303137202d20687474703a2f2f7777772e766964656f6c616e2e6f72672f783236342e68746d6c202d206f7074696f6e733a2063616261633d31207265663d34206465626c6f636b3d313a303a3020616e616c7973653d3078333a3078313133206d653d686578207375626d653d38207073793d31207073795f72643d312e30303a302e3030206d697865645f7265663d31206d655f72616e67653d3136206368726f6d615f6d653d31207472656c6c69733d32203878386463743d312063716d3d3020646561647a6f6e653d32312c313120666173745f70736b69703d31206368726f6d615f71705f6f66667365743d2d3220746872656164733d3334206c6f6f6b61686561645f746872656164733d3520736c696365645f746872656164733d30206e723d3020646563696d6174653d3120696e7465726c616365643d3020626c757261795f636f6d7061743d302073746974636861626c653d3120636f6e73747261696e65645f696e7472613d3020626672616d65733d3320625f707972616d69643d3220625f61646170743d3220625f626961733d30206469726563743d3320776569676874623d31206f70656e5f676f703d3020776569676874703d32206b6579696e743d696e66696e697465206b6579696e745f6d696e3d3330207363656e656375743d343020696e7472615f726566726573683d302072635f6c6f6f6b61686561643d35302072633d3270617373206d62747265653d3120626974726174653d353030302072617465746f6c3d312e302071636f6d703d302e36302071706d696e3d352071706d61783d3639207170737465703d342063706c78626c75723d32302e302071626c75723d302e35207662765f6d6178726174653d35353030207662765f62756673697a653d3135303030206e616c5f6872643d6e6f6e652066696c6c65723d302069705f726174696f3d312e34302061713d313a312e303000
    [Parsed_showinfo_0 @ 0x3741c00] 
    [Parsed_showinfo_0 @ 0x3741c00] color_range:tv color_space:bt709 color_primaries:bt709 color_trc:bt709
    Output #0, null, to 'pipe:':
      Metadata:
        major_brand     : mp42
        minor_version   : 0
        compatible_brands: mp42mp41isomavc1
        encoder         : Lavf59.27.100
      Stream #0:0(und): Video: wrapped_avframe, yuv420p(tv, bt709, progressive), 1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 200 kb/s, 30 fps, 30 tbn (default)
        Metadata:
          creation_time   : 2018-01-23T22:02:00.000000Z
          handler_name    : L-SMASH Video Handler
          vendor_id       : [0][0][0][0]
          encoder         : Lavc59.37.100 wrapped_avframe
      Stream #0:1(und): Audio: pcm_s16le, 48000 Hz, mono, s16, 768 kb/s (default)
        Metadata:
          creation_time   : 2018-01-23T22:02:00.000000Z
          handler_name    : L-SMASH Audio Handler
          vendor_id       : [0][0][0][0]
          encoder         : Lavc59.37.100 pcm_s16le
    [Parsed_showinfo_0 @ 0x3741c00] n:   1 pts:      1 pts_time:0.0333333 pos:   149037 fmt:yuv420p sar:1/1 s:1920x1080 i:P iskey:0 type:B checksum:2005D769 plane_checksum:[92BD4F7B 3501F48D 0CAA9352] mean:[75 124 124] stdev:[52.5 4.7 11.7]
    [Parsed_showinfo_0 @ 0x3741c00] color_range:tv color_space:bt709 color_primaries:bt709 color_trc:bt709
    [Parsed_showinfo_0 @ 0x3741c00] n:   2 pts:      2 pts_time:0.0666667 pos:   139805 fmt:yuv420p sar:1/1 s:1920x1080 i:P iskey:0 type:B checksum:09AFB702 plane_checksum:[3E62184D 9D0A0753 8BF09762] mean:[75 124 124] stdev:[52.4 4.6 11.5]
    [Parsed_showinfo_0 @ 0x3741c00] color_range:tv color_space:bt709 color_primaries:bt709 color_trc:bt709
    [Parsed_showinfo_0 @ 0x3741c00] n:   3 pts:      3 pts_time:0.1     pos:   157017 fmt:yuv420p sar:1/1 s:1920x1080 i:P iskey:0 type:B checksum:99F05FA9 plane_checksum:[FFA84276 7A3D6D59 0290AFCB] mean:[75 124 124] stdev:[52.2 4.5 11.3]
    [Parsed_showinfo_0 @ 0x3741c00] color_range:tv color_space:bt709 color_primaries:bt709 color_trc:bt709
    [Parsed_showinfo_0 @ 0x3741c00] n:   4 pts:      4 pts_time:0.133333 pos:   117259 fmt:yuv420p sar:1/1 s:1920x1080 i:P iskey:0 type:P checksum:00935CD8 plane_checksum:[F81E097E 5F17005D B01452FD] mean:[74 124 124] stdev:[52.2 4.5 11.3]
    [Parsed_showinfo_0 @ 0x3741c00] color_range:tv color_space:bt709 color_primaries:bt709 color_trc:bt709
    [Parsed_showinfo_0 @ 0x3741c00] n:   5 pts:      5 pts_time:0.166667 pos:   197428 fmt:yuv420p sar:1/1 s:1920x1080 i:P iskey:0 type:B checksum:30E77B4C plane_checksum:[393DAA75 DFA88599 E2164B2F] mean:[74 124 125] stdev:[52.3 4.4 11.0]
    [Parsed_showinfo_0 @ 0x3741c00] color_range:tv color_space:bt709 color_primaries:bt709 color_trc:bt709
    [Parsed_showinfo_0 @ 0x3741c00] n:   6 pts:      6 pts_time:0.2     pos:   187073 fmt:yuv420p sar:1/1 s:1920x1080 i:P iskey:0 type:B checksum:BD5C25BC plane_checksum:[CC66DD70 F4ACA5DB 955DA253] mean:[75 124 125] stdev:[52.2 4.4 10.8]

As seen above, the output shows a starting PTS of 0 for 1st frame. However, I was looking at ctts, and stts entries in the MP4 headers with the help of ParseTimingInfoInMp4.py. This shows a different PTS (e.g., 0.0667s) for the 1st frame as seen below.


    ftyp    size              32
    mvhd    size             108
    iods    size              42
    tkhd    size              92
    edts    size              36
    mdhd    size              32
    Trak type:  b'vide'
    Video Trak Number 0 found
    video track timescale is 30
    mdhd    size              32
    hdlr    size              54
    vmhd    size              20
    dinf    size              36
    stsd    size             195
    stts size  2944  ctts size  2944
    0    dts = 0.0000 s,    pts = 0.0667 s,    diff in ms    66.67
    1    dts = 0.0333 s,    pts = 0.2000 s,    diff in ms    166.67
    2    dts = 0.0667 s,    pts = 0.1333 s,    diff in ms    66.67
    3    dts = 0.1000 s,    pts = 0.1000 s,    diff in ms    0.00
    4    dts = 0.1333 s,    pts = 0.1667 s,    diff in ms    33.33
    5    dts = 0.1667 s,    pts = 0.3333 s,    diff in ms    166.67
    6    dts = 0.2000 s,    pts = 0.2667 s,    diff in ms    66.67
    7    dts = 0.2333 s,    pts = 0.2333 s,    diff in ms    0.00
    8    dts = 0.2667 s,    pts = 0.3000 s,    diff in ms    33.33
    9    dts = 0.3000 s,    pts = 0.4333 s,    diff in ms    133.33
    10    dts = 0.3333 s,    pts = 0.3667 s,    diff in ms    33.33

MP4Analyser shows the following entries for stss, ctts, and edts-> for video track. stss entries for video track ctts entries for video track edts -> elst entry for video track

The sample file I have been using can be found in Sample mp4.

Can someone please help me to understand

  1. why the PTS values shown in ffmpeg are different from PTS derived from stss and ctts?
  2. What is the correct process in deriving PTS from stts, ctts and edts entries in MP4 header?
1

There are 1 best solutions below

8
Gyan On

ffmpeg will, by default, offset pts to start from 0. Add -copyts to avoid this.

Next, the decoder will reorder frames in presentation order, which will be different from storage order. This applies when the stream has B-frames.

Edit list adjustment can get complex, so I'll address the two simple cases: an initial dwell is applied as an offset to all timestamps. An initial edit (trimming of the start of the media) leads to negative timestamps for all samples before the edit start point.