Main question
For a few months now, I've been downloading videos from Flickr in Colab, using Selenium. However, since around a week ago, my code stopped working.
It started getting the 502 Bad Gateway error on all videos that it tries to open. As I thought it was a temporary bug or something like that, I gave it some days and waited for the problem to disappear, but it didn't.
I tried changing the configuration of Selenium, although didn't serve much. I also tried only using Urllib, but it gives me the same error. My last card was using a proxy, which almost worked, but wasn't able to actually download the loaded video.
I avoid asking questions here, but seems like this problem has bested me. Could someone give me a hand, please?
Complementary code
Here's the main part of the code that I've been using:
!sudo apt update
!pip install chromedriver-autoinstaller selenium
!apt install chromium-chromedriver
!cp /usr/lib/chromium-browser/chromedriver /usr/bin
from IPython.display import clear_output
from selenium import webdriver
import chromedriver_autoinstaller
import sys
import urllib, re
sys.path.insert(0,'/usr/lib/chromium-browser/chromedriver')
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
chromedriver_autoinstaller.install()
clear_output()
driver = webdriver.Chrome(options=chrome_options)
videos_folder = "/content/videos/"
os.mkdir(videos_folder)
def download_video_from_url(url, name):
driver.get(url)
html = driver.page_source
video_url = re.findall(r'"[^ ]*\.mp4[^ ]*"', html)
if (len(video_url) > 0):
video_url = video_url[0][1:-1]
try:
with open(videos_folder + name, "wb") as f:
f.write(urllib.request.urlopen(video_url).read())
return 1
except: return 0
else: return 0
And here's my code that uses a proxy:
!sudo apt update
!pip install chromedriver-autoinstaller selenium
!apt install chromium-chromedriver
!cp /usr/lib/chromium-browser/chromedriver /usr/bin
from IPython.display import clear_output
from selenium import webdriver
import chromedriver_autoinstaller
import sys
import time
import urllib, re
sys.path.insert(0,'/usr/lib/chromium-browser/chromedriver')
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
chromedriver_autoinstaller.install()
clear_output()
driver = webdriver.Chrome(options=chrome_options)
videos_folder = "/content/videos/"
os.mkdir(videos_folder)
def download_video_from_url(url, name):
try:
driver.get("https://www.proxysite.com/")
element = driver.find_element(By.XPATH, '//*[@id="url-form-wrap"]/form/div[2]/input')
element.send_keys(url)
element.send_keys(Keys.ENTER)
html = driver.page_source
time.sleep(5)
element = driver.find_element(By.XPATH, '/html/body')
for i in range(5): element.send_keys(Keys.TAB)
for i in range(2): element.send_keys(Keys.ENTER)
return 1
except: return 0