I have a ttml file that contains video captions, I want to fetch thru all the pairs time\caption and place them into a JSON file, I have tried https://www.npmjs.com/package/ttml?activeTab=readme but it did not work this one. Any ideas ? Thank you
How to browse thru TTML and get all the time\captions into JSON file
450 Views Asked by Lydia halls At
2
There are 2 best solutions below
0
On
For folks that prefer Python, ttconv can split TTML/IMSC documents into a series of Intermediate Synchronic Documents (ISDs), each one corresponding to a period of time where the contents of the TTML/IMSC document is static.
import ttconv.imsc.reader
import ttconv.isd
import xml.etree.ElementTree as et
tt_doc = """<?xml version="1.0" encoding="UTF-8"?>
<tt xml:lang="fr" xmlns="http://www.w3.org/ns/ttml">
<body>
<div>
<p begin="1s" end="2s">Hello</p>
<p begin="3s" end="4s">Bonjour</p>
</div>
</body>
</tt>"""
m = ttconv.imsc.reader.to_model(et.ElementTree(et.fromstring(tt_doc)))
st = ttconv.isd.ISD.significant_times(m)
for t in st:
isd = ttconv.isd.ISD.from_model(m, t)
# convert ISD to JSON
ttconv also supports conversion from TTML/IMSC to SRT, which is a simple text-based format. All styling information is lost however.
tt.py convert -i <input .ttml file> -o <output .srt file> --otype SRT --itype TTML
Try looking at https://github.com/sandflow/imscJS for code that extracts the Intermediate Synchronic Documents (ISDs) - e.g. the file isd.js may be relevant.
By the way, it's worth noting that the data model in TTML doesn't exactly match the idea of a mapping between pairs of times and individual captions. You may get duplications.
Each ISD is a snapshot between two moments on the timeline in which the presented content does not change.
This is an important distinction because in TTML it is possible to have the same "caption" appear at times that overlap with other captions appearing and disappearing, for example:
So the result in ISDs is:
As you can see that first line appears in two ISDs. It's up to you in your application how you deal with this, of course.