I collect data from one site, using python, selenium and chromedriver. I use my google chrome profile to bypass the captcha. There was a need to run my script in a docker container. I can't launch chrome with my profile from docker.
My Dockerfile
FROM python:3.8
# Adding trusting keys to apt for repositories
RUN wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add -
# Adding Google Chrome to the repositories
RUN sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list'
# Updating apt to see and install Google Chrome
RUN apt-get -y update
# Magic happens
RUN apt-get install -y google-chrome-stable
# Installing Unzip
RUN apt-get install -yqq unzip
# Download the Chrome Driver
RUN wget -O /tmp/chromedriver.zip http://chromedriver.storage.googleapis.com/`curl -sS chromedriver.storage.googleapis.com/LATEST_RELEASE`/chromedriver_linux64.zip
# Unzip the Chrome Driver into /usr/local/bin directory
RUN unzip /tmp/chromedriver.zip chromedriver -d /usr/local/bin/
# Set display port as an environment variable
ENV DISPLAY=:99
WORKDIR /app
COPY /src/yamarket_parsing_container.py .
COPY /requirements.txt .
RUN pip install -r requirements.txt
CMD ["python", "yamarket_parsing_container.py"]
my code
PATH_TO_CHROME_PROFILE = r"user-data-dir=../root/.config/google-chrome/User Data/"
PROFILE_DIR_NAME = '--profile-directory=Default'
def init_browser(p_profile_path, p_profile_dir_name):
options = Options()
options.add_argument(p_profile_path)
options.add_argument(p_profile_dir_name)
options.add_argument('--headless=new')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
return webdriver.Chrome(options=options)
m_browser = init_browser(PATH_TO_CHROME_PROFILE, PROFILE_DIR_NAME)
run command
D:\Shared\parsing_yandex_market>docker run -ti -v "C:\Users\Victor\AppData\Local\Google\Chrome\User Data":/root/.config/google-chrome img
error
Traceback (most recent call last):
File "yamarket_parsing_container.py", line 114, in <module>
main()
File "yamarket_parsing_container.py", line 90, in main
m_browser = init_browser(PATH_TO_CHROME_PROFILE, PROFILE_DIR_NAME)
File "yamarket_parsing_container.py", line 28, in init_browser
return webdriver.Chrome(options=options)
File "/usr/local/lib/python3.8/site-packages/selenium/webdriver/chrome/webdriver.py", line 80, in __init__
super().__init__(
File "/usr/local/lib/python3.8/site-packages/selenium/webdriver/chromium/webdriver.py", line 104, in __init__
super().__init__(
File "/usr/local/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 286, in __init__
self.start_session(capabilities, browser_profile)
File "/usr/local/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 378, in start_session
response = self.execute(Command.NEW_SESSION, parameters)
File "/usr/local/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 440, in execute
self.error_handler.check_response(response)
File "/usr/local/lib/python3.8/site-packages/selenium/webdriver/remote/errorhandler.py", line 245, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: Could not read in devtools port number
Stacktrace:
#0 0x55832cd86243 <unknown>
#1 0x55832cb4a7a6 <unknown>
#2 0x55832cb76ed0 <unknown>
#3 0x55832cb72a51 <unknown>
#4 0x55832cb6f49b <unknown>
#5 0x55832cbb12a7 <unknown>
#6 0x55832cbb08cf <unknown>
#7 0x55832cba7e53 <unknown>
#8 0x55832cb7a9ea <unknown>
#9 0x55832cb7bb2e <unknown>
#10 0x55832cddad5e <unknown>
#11 0x55832cddea80 <unknown>
#12 0x55832cdc08b0 <unknown>
#13 0x55832cddfb63 <unknown>
#14 0x55832cdb1f75 <unknown>
#15 0x55832ce02998 <unknown>
#16 0x55832ce02b27 <unknown>
#17 0x55832ce1dc23 <unknown>
#18 0x7f74762c1ea7 start_thread
Has anyone encountered such a problem? I would be grateful for any advice.
I have tried placing the "User data" directory in different locations, but without success. Chromedriver is not initializing.