What is currently the easiest way to download videos from Flickr in Google Colab?

40 Views Asked by At

Main question

For a few months now, I've been downloading videos from Flickr in Colab, using Selenium. However, since around a week ago, my code stopped working.

It started getting the 502 Bad Gateway error on all videos that it tries to open. As I thought it was a temporary bug or something like that, I gave it some days and waited for the problem to disappear, but it didn't.

I tried changing the configuration of Selenium, although didn't serve much. I also tried only using Urllib, but it gives me the same error. My last card was using a proxy, which almost worked, but wasn't able to actually download the loaded video.

I avoid asking questions here, but seems like this problem has bested me. Could someone give me a hand, please?

Complementary code

Here's the main part of the code that I've been using:

!sudo apt update
!pip install chromedriver-autoinstaller selenium
!apt install chromium-chromedriver
!cp /usr/lib/chromium-browser/chromedriver /usr/bin

from IPython.display import clear_output
from selenium import webdriver
import chromedriver_autoinstaller
import sys
import urllib, re

sys.path.insert(0,'/usr/lib/chromium-browser/chromedriver')
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
chromedriver_autoinstaller.install()
clear_output()
driver = webdriver.Chrome(options=chrome_options)

videos_folder = "/content/videos/"
os.mkdir(videos_folder)

def download_video_from_url(url, name):
  driver.get(url)
  html = driver.page_source
  video_url = re.findall(r'"[^ ]*\.mp4[^ ]*"', html)
  if (len(video_url) > 0):
    video_url = video_url[0][1:-1]
    try:
      with open(videos_folder + name, "wb") as f:
        f.write(urllib.request.urlopen(video_url).read())
      return 1
    except: return 0
  else: return 0

And here's my code that uses a proxy:

!sudo apt update
!pip install chromedriver-autoinstaller selenium
!apt install chromium-chromedriver
!cp /usr/lib/chromium-browser/chromedriver /usr/bin

from IPython.display import clear_output
from selenium import webdriver
import chromedriver_autoinstaller
import sys
import time
import urllib, re

sys.path.insert(0,'/usr/lib/chromium-browser/chromedriver')
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
chromedriver_autoinstaller.install()
clear_output()
driver = webdriver.Chrome(options=chrome_options)

videos_folder = "/content/videos/"
os.mkdir(videos_folder)

def download_video_from_url(url, name):
  try:
    driver.get("https://www.proxysite.com/")
    element = driver.find_element(By.XPATH, '//*[@id="url-form-wrap"]/form/div[2]/input')
    element.send_keys(url)
    element.send_keys(Keys.ENTER)
    html = driver.page_source
    time.sleep(5)
    element = driver.find_element(By.XPATH, '/html/body')
    for i in range(5): element.send_keys(Keys.TAB)
    for i in range(2): element.send_keys(Keys.ENTER)
    return 1
  except: return 0
0

There are 0 best solutions below