I need to automate tasks to extract information from websites using the uBlock plugin with Chrome driver using the selenium
module in Python 3
.
I am running my code remotely without a GUI on the remote machine - for that I am using xvfb-run
to simulate a desktop environment where Chrome launches with a specific window size.
The remote machine has the following Debian
operating system:
uname -a
Linux mem 4.19.0-10-amd64 #1 SMP Debian 4.19.132-1 (2020-07-24) x86_64 GNU/Linux
These were the steps I took to configure my environment and code on the remote machine:
1 - To configure my environment, I installed this version of Google Chrome:
google-chrome --version
Google Chrome 86.0.4240.111
2 - Check the versions of Python 3
and selenium
I installed:
python --version
Python 3.7.3
pip freeze
selenium==3.141.0
3 - Check the xvfb-run
version:
apt-cache policy xvfb
2:1.20.4-1+deb10u1
4 - With these packages configured, I obtained the chromedriver_linux64.zip
from this list (the version 86.0.4240.22 below is the most recent one that is the same major version of the installed google-chrome
):
https://chromedriver.storage.googleapis.com/index.html
https://chromedriver.storage.googleapis.com/index.html?path=86.0.4240.22/
5 - To be able to use the uBlock extension of Chrome, I needed to install an extension that is able to produce a .crx archive file of other installed extensions. For this, I used CRX Extractor/Downloader:
https://chrome.google.com/webstore/detail/crx-extractordownloader/ajkhmmldknmfjnmeedkbkkojgobmljda
6 - After using that extension, I got my ublock.crx
file to test.
I managed to use the binary in chromedriver_linux64.zip
without the extension to launch a Chrome instance and do some basic crawling.
But when I tried to use ublock.crx
in my code, I got an exception.
The code was this:
This is the exception produced:
ublock.crx error
selenium.common.exceptions.SessionNotCreatedException: Message: session not
created: cannot process extension #1
from unknown error: cannot unzip
I am launching it from my program like this:
from selenium import webdriver
option = webdriver.ChromeOptions()
option.add_extension(ublock_crx_file_path)
driver = webdriver.Chrome(executable_path=driver_path, options=option)
I have made sure the path of ublock_crx_file_path
is valid and points to the file I obtained from Chrome.
Hopefully someone can shed light on this?