My goal is to download a YouTube video with auto-generated subtitles in a separate file like .vtt
,.srt
, etc.
I am currently trying to achieve this with youtube-dl
but I am open to other solutions if needed.
When I run the following command, it downloads the video as .mp4
(which is fine) and a separate .vtt
file, but the .vtt
seems to be messed-up somehow and displays all the text for the whole clip at once instead of the specified times.
Command I am running:
youtube-dl --write-auto-sub https://www.youtube.com/watch?v=Roc89oOZOF4&list=PLJBo3iyb1U0eNNN4Dij3N-d0rCJpMyAKQ&index=45
Downloads this .vtt:
WEBVTT
Kind: captions
Language: en
00:00:05.960 --> 00:00:08.290 align:start position:0%
thank <00:00:06.003><c>you </c><00:00:06.046><c>ah </c><00:00:06.089><c>crap </c><00:00:06.132><c>well </c><00:00:06.175><c>looks </c><00:00:06.218><c>like </c><00:00:06.261><c>the </c><00:00:06.304><c>good </c><00:00:06.347><c>Lord </c><00:00:06.390><c>just </c><00:00:06.433><c>sent </c><00:00:06.476><c>me </c><00:00:06.519><c>a </c><00:00:06.562><c>conversation </c><00:00:06.605><c>starter </c><00:00:06.648><c>come </c><00:00:06.691><c>here </c><00:00:06.734><c>Jesse </c><00:00:06.777><c>come </c><00:00:06.820><c>get </c><00:00:06.863><c>the </c><00:00:06.906><c>ball </c><00:00:06.949><c>hmm</c>
00:00:08.290 --> 00:00:10.549 align:start position:0%
thank you ah crap well looks like the good Lord just sent me a conversation starter come here Jesse come get the ball hmm
00:00:10.549 --> 00:00:13.070 align:start position:0%
00:00:13.070 --> 00:00:15.470 align:start position:0%
00:00:15.470 --> 00:00:23.750 align:start position:0%
00:00:23.750 --> 00:00:23.760 align:start position:0%
00:00:23.760 --> 00:00:26.480 align:start position:0%
I have read that this may be done on purpose by YouTube.
Even if this is true, is there any way to convert this .vtt
to a usable format or simply download correctly-formatted auto-generated subtitles from YouTube?
Python, FFMPEG, cmd-line preferred, but anything is helpful!
Thanks! Any and all assistance is greatly appreciated!
to convert vtt I use this one
to removes duplicates
I use this, there are still some problem so you need to check them out
removes duplicates of 2nd row of previous line from the 1st row of the current line
(see 1st row of 11 [original])
lines are still repeated but now it is the same row
(see 11 and 12)
becomes
edit: someone already made an app to fix duplicate line find it here https://github.com/bindestriche/srt_fix