I am generating MP4 files (with h.264 video and AAC audio) by transmuxing from MPEG-TS in JavaScript to be played in the browser via blob URLs. Everything works fine in Chrome, and if I grab the blob URLs out of the developer console and download them, the generated files play fine on Windows Media Player as well. Firefox, however, claims that they are corrupted.
I've narrowed the issue down to a problem with the ESDS box in the audio metadata. If I repackage the source MPEG-TS files by some other means (like ffmpeg), and hand-edit my generated files in a hex editor to paste in the ESDS box from the equivalent file generated by other software, then Firefox is happy.
My code that builds the ESDS box. (And I'm tracking the issue)
I attempted to write it by a pretty straightforward transcribe-stuff-from-the-MPEG-specs process, but that is no guarantee that I did not screw it up. Since Chrome and Windows Media play my files just fine, I'm not sure if it's actually an error in my file that they are somehow capable of ignoring, or if it's a problem with Firefox. I suspect the former, but I'm just not sure.
Anyone got any insight, or perhaps a straightforward, easy-to-understand reference for how to build a proper ESDS box?
EDIT: Here are some different ESDS sections produced for the same input file (as hex bytes, copied out of my hex editor):
Mine:
00 00 00 27 65 73 64 73 00 00 00 00 03 22 00 00
02 04 14 40 15 00 00 00 00 00 3a f1 00 00 2d e6
05 02 12 10 06 01 02
00 00 00 33 65 73 64 73 00 00 00 00 03 80 80 80
22 00 02 00 04 80 80 80 14 40 15 00 00 00 00 00
00 00 00 00 00 00 05 80 80 80 02 12 10 06 80 80
80 01 02
ffmpeg:
00 00 00 2c 65 73 64 73 00 00 00 00 03 80 80 80
1b 00 02 00 04 80 80 80 0d 40 15 00 00 00 00 01
5f 42 00 00 00 00 06 80 80 80 01 02
Oddly, and I did not notice this before, Firefox will play the video with ffmpeg's output, but neither Firefox nor Windows Media will actually play the sound (Chrome does). Firefox and Windows Media are both happy to play the video with sound using the output from mpegts, though. With mine, Chrome and Windows Media will play with video with sound, but Firefox doesn't play at all, and claims the video is corrupted.
You have now found your solution by adding three bytes of 0x80 each after the ES Descriptor Tag number. Glad that worked out for all browsers.
Let me share one insight that may help you or future users of your code:
Well looking at this link for mp4ESDSbox.java we see ESDS atom is broken into five sections and each section is padded by the bytes
80 80 80
. These three bytes are decribed as "optional extended descriptor type tag string" with possible types values being.. 80 or 81 or FEYou're on the right path but you only have padded the first section.
MP4Muxer.js
: (A) What you currently have...00 00 00 27 65 73 64 73 00 00 00 00 03 80 80 80
22 00 00 02 04 14 40 15 00 00 00 00 00 3A F1 00
00 2D E6 05 02 12 10 06 01 02
MP4Muxer.js
: (B) What it should be...00 00 00 33 65 73 64 73 00 00 00 00 03 80 80 80
22 00 00 02 04 80 80 80 14 40 15 00 00 00 00 00
3A F1 00 00 2D E6 05 80 80 80 02 12 10 06 80 80
80 01 02
FFMpeg ESDS for random AAC track
: Compare against new (B) version00 00 00 33 65 73 64 73 00 00 00 00 03 80 80 80
22 00 01 00 04 80 80 80 14 40 15 00 00 00 00 01
F4 74 00 01 F4 74 05 80 80 80 02 12 10 06 80 80
80 01 02
Comparing the bytes structure of version B) against those made by FFMpeg we see now there is perfect alignment. Some values are slightly different cos they are not made from the same audio data.
Notice we have changed the first four bytes (size integer) to
x33
(decimal == 51 bytes length) from the originalx27
which was (decimal == 39 bytes length)