YouTube Playlist API does not return all videos in a channel

193 Views Asked by At

We are trying to get all videos in a channel, like this. This list has 291k videos, we figured out the channel id of this channel (and replaced the second alphabet "C" in the id by "U"), and trying this code, iterating over 50 videos at a time. We are getting only upto some 20k videos, not more than that. Any idea on how to fix this and get all 291k videos in this channel? Checked this for a variety of channels with large number of videos, all have the same problem.

api_key = "my Google YouTube API V3 key"
from googleapiclient.discovery import build
youtube = build('youtube', 'v3', developerKey=api_key)
def get_channel_videos():    
    videos = []
    next_page_token = None
    while 1:
        res = youtube.playlistItems().list(playlistId="UU...", 
                                           part='snippet', 
                                           maxResults=50,
                                           pageToken=next_page_token).execute()
        videos += res['items']
        next_page_token = res.get('nextPageToken')
        if next_page_token is None:
            break
    return videos
videos = get_channel_videos()
with open("video.txt", "a") as myfile:
    for video in videos:
        myfile.write(f"{video['snippet']['resourceId']['videoId']} => {video['snippet']['title']}\n")

print(f"Total video count => {len(videos)}")
3

There are 3 best solutions below

8
Benjamin Loison On BEST ANSWER

I investigated many different approaches and the only one which seems to perfectly work is the following one based on web-scraping the Videos tab of the specified channel:

import requests
from lxml import html
import json

CHANNEL_HANDLE = '@MLB'
text = requests.get(f'https://www.youtube.com/{CHANNEL_HANDLE}/videos').text
tree = html.fromstring(text)

ytVariableName = 'ytInitialData'
ytVariableDeclaration = ytVariableName + ' = '
for script in tree.xpath('//script'):
    scriptContent = script.text_content()
    if ytVariableDeclaration in scriptContent:
        ytVariableData = json.loads(scriptContent.split(ytVariableDeclaration)[1][:-1])
        break

contents = ytVariableData['contents']['twoColumnBrowseResultsRenderer']['tabs'][1]['tabRenderer']['content']['richGridRenderer']['contents']

videoIds = set()

def treatContents(contents):
    for content in contents:
        if not 'richItemRenderer' in content:
            break
        videoId = content['richItemRenderer']['content']['videoRenderer']['videoId']
        videoIds.add(videoId)
    print(len(videoIds))
    return getContinuationToken(contents)

def getContinuationToken(contents):
    # Sometimes have 29 actual results instead of 30.
    lastContent = contents[-1]
    if not 'continuationItemRenderer' in lastContent:
        exit(0)
    return lastContent['continuationItemRenderer']['continuationEndpoint']['continuationCommand']['token']

continuationToken = treatContents(contents)

url = 'https://www.youtube.com/youtubei/v1/browse'
headers = {
    'Content-Type': 'application/json'
}
requestData = {
    'context': {
        'client': {
            'clientName': 'WEB',
            'clientVersion': '2.20240313.05.00'
        }
    }
}
while True:
    requestData['continuation'] = continuationToken
    data = requests.post(url, headers = headers, json = requestData).json()
    # Happens not deterministically sometimes.
    if not 'onResponseReceivedActions' in data:
        print('Retrying')
        continue
    continuationItems = data['onResponseReceivedActions'][0]['appendContinuationItemsAction']['continuationItems']
    continuationToken = treatContents(continuationItems)

While @MLB About claims 291,597 videos, my method finds 289,814 unique videos. It is unknown where the count difference comes from, possibly from Lives and unlisted videos.

1
VonC On

iterating over 50 videos at a time. We are getting only upto some 20k videos, not more than that

Not surprising, considering the YouTube API quota limit, detailed in "Youtube API limits: How to calculate API usage cost and fix exceeded API quota".

Benjamin Loison points out to "How to extract metadata for more than 20000 videos from channel using YouTube Data API v3?", which mentions a hard-coded server-side API limit.

But in both cases (quota or hard-coded limit), you might either:

  • Incremental Retrieval: If you don't need to retrieve all the videos at once, you can consider retrieving them incrementally over multiple runs of your script. You can store the nextPageToken value and the retrieved video information in a file or database, and resume the retrieval process from where you left off in the next run.

  • API Key Rotation: If you have multiple API keys, you can rotate them to distribute the API requests across different keys. This can help you avoid hitting the quota limit for a single API key.

The last approach would look like:

from googleapiclient.discovery import build
from googleapiclient.errors import HttpError

# List of your Google YouTube API V3 keys
api_keys = ["my Google YouTube API V3 key 1", "my Google YouTube API V3 key 2", "etc."]

def build_youtube_service(api_key):
    return build('youtube', 'v3', developerKey=api_key)

def get_channel_videos(api_keys):
    videos = []
    next_page_token = None
    api_key_index = 0  # Start with the first API key
    
    while True:
        try:
            # Build the YouTube service object with the current API key
            youtube = build_youtube_service(api_keys[api_key_index])
            res = youtube.playlistItems().list(playlistId="UU...",
                                               part='snippet', 
                                               maxResults=50,
                                               pageToken=next_page_token).execute()
            videos += res['items']
            next_page_token = res.get('nextPageToken')
            if next_page_token is None:
                break
        except HttpError as e:
            if e.resp.status in [403, 429]:  # Quota error detected
                print(f"Quota error with API key {api_keys[api_key_index]}, switching keys...")
                api_key_index += 1  # Move to the next API key
                if api_key_index >= len(api_keys):  # Check if we've exhausted all API keys
                    print("All API keys have reached their quota limits.")
                    break
                continue
            else:
                raise  # Re-raise the exception if it's not a quota error
    
    return videos

videos = get_channel_videos(api_keys)

with open("video.txt", "a") as myfile:
    for video in videos:
        myfile.write(f"{video['snippet']['resourceId']['videoId']} => {video['snippet']['title']}\n")

print(f"Total video count => {len(videos)}")
3
Jey On

If the Api itself doesn't work, you might consider traversing through the xml similar to this:

internal async Task<VideoInfo[]> GetDataFromWebsite(string url)
{
    //strings used for traversing the website
    string startSearch = "else {window.addEventListener('script-load-dpj',";
    string endSearch = "</div>";
    string beforeEachEntry = "\"ADD_TO_QUEUE_TAIL\"";
    string beforeurl = "{\"webCommandMetadata\":{\"url\":\"/watch";
    string afterUrl = "\",\"webPageType\"";
    string filterTitleBetter = ",\"width\":336,\"height\":188}]}";
    string beforeTitle = ",\"title\":{\"runs\":[{\"text\":\"";
    string aftertitle = "\"";

    VideoInfo[] videoInfos;

    using (HttpClient client = new HttpClient())
    {
        //Get a response from Youtube
        HttpResponseMessage response = await client.GetAsync(url);

        //check if the Request was successful
        if (response.IsSuccessStatusCode)
        {
            //Get the XML from the response
            string xml = await response.Content.ReadAsStringAsync();

            //narrow down the section where we search for useful data
            string[] allData = xml.Split(startSearch)[1].Split(endSearch)[0].Split(beforeEachEntry);

            //initialize the return object 
            //the size is one less than allData.Length because the first entry contains no data
            videoInfos = new VideoInfo[allData.Length - 1];

            //iterate through the data
            for (int i = 1; i < allData.Length; i++)
            {
                //narrow down the entry section for the title
                string[] splitForVideoName = allData[i].Split(filterTitleBetter);
                //split the entry string exactly to contain the video title
                string videoName = splitForVideoName[splitForVideoName.Length - 1].Split(beforeTitle)[1].Split(aftertitle)[0];

                //split the entry string exactly to contain the video url
                string videoURL = "https://www.youtube.com/watch" + allData[i].Split(beforeurl)[1].Split(afterUrl)[0];

                //add those information to our return object
                videoInfos[i - 1] = new VideoInfo() { videoURL = videoURL, videoName = videoName };
            }
            client.Dispose();
            return videoInfos;
        }
        else
        {
            client.Dispose();
            throw new Exception();
        }
    }

internal class VideoInfo
{
    internal string videoURL;
    internal string videoName;
}

This would need some more work to be fully functional for every playlist, but with https://www.youtube.com/playlist?list=PLlVlyGVtvuVlklX5bYqk9RHnOSmUP6j0h it worked so far.