I have a list of youtube videos from different playlists and I need to check if these videos are still valid (they are around 1000). What I am doing at the moment it is hitting Youtube using its API v2 and Groovy with this simple script:
import groovyx.net.http.HTTPBuilder
import static groovyx.net.http.Method.GET
http = new HTTPBuilder('http://gdata.youtube.com')
myVideoIds.each { id ->
if (!isValidYoutubeUrl(id)) {
// do stuff
}
}
boolean isValidYoutubeUrl (id) {
boolean valid = true
http.request(GET) {
uri.path = "feeds/api/videos/${id}"
headers.'User-Agent' = 'Mozilla/5.0 Ubuntu/8.10 Firefox/3.0.4'
response.failure = { resp ->
valid = false
}
}
valid
}
but after a few seconds it starts to return 403 for any single id (it may be due to the fact it is running too many requests closely). The problem is reduced if I insert something like Thread.sleep(3000)
. Is there a better solution than just delaying the requests?
In V2 of the API, there are time-based limits on how many requests you can make, but they aren't a hard and fast limit (that is, it depends somewhat on many under-the-hood factors and may not always be the same limit). Here's what the documentation says:
You can avoid this by putting in a sleep like you do, but you'd want it to be 10-15 seconds or so.
It's more important, though, to implement batch processing. With this, you can make up to 50 requests at once (this counts as 50 requests against your overall request per day quota, but only as one against your per time quota). Batch processing with v2 of the API is a little involved, as you make a POST request to a batch endpoint first, and then based on those results you can send in the multiple requests. Here's the documentation:
https://developers.google.com/youtube/2.0/developers_guide_protocol?hl=en#Batch_processing
If you use v3 of the API, batch processing becomes quite a bit easier, as you just send 50 IDs at a time in the request. Change:
to:
Then set your
uri.path
toFor 1000 videos, then, you'll only need to make 20 calls. Any video that doesn't come back in the list doesn't exist anymore (if you need to get video details, change the
part
parameter to beid,snippet,contentDetails
or something appropriate for your needs.Here's the documentation:
https://developers.google.com/youtube/v3/docs/videos/list#id