problem of pytube caption selection after new update

702 Views Asked by At

the lauguage code format of pytube caption seems to be changed.

from pytube import YouTube
video_link = r'https://www.youtube.com/watch?v=w7daiJHfjoY'
yt = YouTube(video_link)
print(yt.captions)

The result now looks like this:

{'a.de': <Caption lang="German (auto-generated)" code="a.de">, 'de.CcQ45jRV4-E': <Caption lang="German - deutsch" code="de.CcQ45jRV4-E">}

before I can extract subtitle simply by yt.captions.get_by_language_code('de')

but because now the language code of caption becomes de.CcQ45jRV4-E, I need to use yt.captions.get_by_language_code('de.CcQ45jRV4-E')

Although it works, I don't know whether this language code is fixed or not. how can I use string wildcard to get the subtitle I want in Caption? Something like: yt.captions.get_by_language_code('de*')

Thank you.

1

There are 1 best solutions below

0
On BEST ANSWER

Iterate over the captions:

from pytube import YouTube
video_link = r'https://www.youtube.com/watch?v=w7daiJHfjoY'
yt = YouTube(video_link)

for c in yt.captions:
    if "de." in c.code:
        caption = c
        break
print(caption)

This assumes that there always will be a dot after "de". For more more complex matching, use regex, but i don't think it's necessary.