File .vtt:
WEBVTT
00:00:00.039 --> 00:00:25.968
VINCENZO Cassano!
00:00:26.044 --> 00:00:26.961
Damn it.
00:01:23.434 --> 00:01:24.894
Mr. Vincenzo Cassano.
00:01:24.978 --> 00:01:27.814
You're under arrest
for the murder of Mr. Oh Jeong-bae.
00:01:43.913 --> 00:01:44.956
Hands up,
00:01:45.540 --> 00:01:46.708
or I'll fire.
00:01:51.504 --> 00:01:52.964
I didn't do it.
Transformed into json:
[
{
"from":"00:00:00.039",
"to":"00:00:25.968",
"timeString":"00:00:00.039 --> 00:00:25.968",
"text":"VINCENZO Cassano!"
},
....,
{
"from":"00:01:24.978",
"to":"00:01:27.814",
"timeString":"00:01:24.978 --> 00:01:27.814",
"text":"You're under arrest\nfor the murder of Mr. Oh Jeong-bae."<- Multi line,i assume it's a \n??
}
]
I have a .vtt file, for subtitles, I have to make sure to create a json array as seen above, also considering the multi line.
I wrote this, which should remove the leading WEBVTT
and the double spaces, but I can't remove the leading space as seen in the image below, index 0
(maybe this problem I managed to fix it by adding .replace('\n', '')
)
const v = enc.decode(text).replace('WEBVTT', '').replace(/[\r\n]{2,}/g, '\n').replace('\n', '');
const lines = v.split('\n');
lines.map((el, i) => console.log(`${i} - ${el}`));
const test = new RegExp('\\b(\\d{2}:\\d{2}:\\d{2})\\.(\\d{3})\\b').test(el);
<-- expression to check if the string is of the type 00:00:00.039
Can you give me a hand?
Might I suggest SRT/VTT Parser? It's output would get you part way there and then you just have to do a little reformatting within your code.
You could also have a look at this answer, which uses another paser, Node VTT.