Seleniumbase not passing detection on Linux

761 Views Asked by At

I am struggling to successfully load a page when I use my script on Linux. I've attempted with Ubuntu 23.10 and Alpine 3.18

On my mac using pycharm + selenium (or seleniumbase) + undetected-chromedriver I can successfully load the page and perform my automation.

I've tried on a Windows 11 machine and the script has also worked.

So I am going crazy trying to work out why on a linux install this doesn't work as desired.

I'm using a venv, some key version details:

    Python 3.11.6
    selenium             4.16.0
    seleniumbase         4.22.0

    chromedriver --version
    ChromeDriver 120.0.6099.71 (9729082fe6174c0a371fc66501f5efc5d69d3d2b-refs/branch-    heads/6099_56@{#13})

    google-chrome --version
    Google Chrome 120.0.6099.71 

I am using send_keys to enter my username and password into the login.

  • The next page - working = Account Details page or an MFA challenge
  • The next page - not working = Something went wrong

At the most basic of tests I have tried:

My preferred way:

    from seleniumbase import Driver
    driver = Driver(uc=True, headless2=True)
    driver.get("https://www.originenergy.com.au/my")
    print("UA= ", driver.get_user_agent())

    driver.type('//*[@id="login_username"]', "email_address")
    driver.type('//*[@id="password"]', "password")
    driver.click('//*[@id="root"]/div/div/div/div/div/div[2]/form/div/button')

Also tried:

    from seleniumbase import SB
    
    with SB(uc=True, headless=True, xvfb=True) as sb:
            sb.open("https://www.originenergy.com.au/my")
            print("UA: ", sb.get_user_agent())
            print("\npage source\n", sb.get_page_source())
            sb.type('//*[@id="login_username"]', "email_address")
            sb.type('//*[@id="password"]', "password")
            print("\nUsername and Password entered\n")
            sb.click('//*[@id="root"]/div/div/div/div/div/div[2]/form/div/button')
            sb.sleep(10)

In another attempt trying to ensure that something like cookies isn't an issue. I have tried using a user profile and enabled all third party cookies:

    driver = Driver(uc=True, headless2=True, user_data_dir="/script/datadir")

If I watch the linux machine with virtual display and headed=True - just after the click of the login button, there seem to be some js scripts ("undetected-chromedriver 1337!" comments, yet the site catches me for error 429 and an ID issue. On the mac and windows machines I also received the 429, however login is still achieved

Any advice or guidance would be awesome. I really don't want to run a windows VM just to run this script.

=== Edit and update ===

I rebuilt the linux environment with bare essentials: Python3.11, venv, pip, seleniumbase x11vnc, xvfb google-chrome

Ran my script using the "from seleniumbase import Driver" method and changed to headed = true:

from seleniumbase import Driver
    driver = Driver(uc=True, headed=True, user_data_dir=/script/newdir)
    driver.get("https://www.originenergy.com.au/my")
    print("UA= ", driver.get_user_agent())

    driver.type('//*[@id="login_username"]', "email_address")
    driver.type('//*[@id="password"]', "password")
    driver.click('//*[@id="root"]/div/div/div/div/div/div[2]/form/div/button')

The first console output shows:

** chromedriver to download = 120.0.6099.71 (Latest Stable) 

Downloading chromedriver-linux64.zip from:
https://edgedl.me.gvt1.com/edgedl/chrome/chrome-for-testing/120.0.6099.71/linux64/chromedriver-linux64.zip ...
Download Complete!

Extracting ['chromedriver'] from chromedriver-linux64.zip ...
Unzip Complete!

The file [uc_driver] was saved to:
/script/venv/lib/python3.11/site-packages/seleniumbase/drivers/uc_driver

Making [uc_driver 120.0.6099.71] executable ...
[uc_driver 120.0.6099.71] is now ready for use!

My user agent doesn't appear as headless:

Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36

So I assume everything is performing as desired from here.

I can watch the chrome tab open, username and password are entered, no "1337" messages in devtools console this time - yet I still get the same result "Something went wrong" page

The devtools console errors I get are:

Failed to load resource: the server responded with a status of 429 () - /149e9513-01fa-4fb0-â¦x-kpsdk-v=j-0.0.0:1 

Which looks to be kasada keystroke SDK - these appear before any text is input with send_keys()

I let the page load with a sleep timer and also sleep before clicking login.

In contrast, these same errors are logged on both my mac and windows - so I don't know if its a red herring or legitimate issue to address

The final error that appears when redirecting me to a "something went wrong page" is this:

[error] handleError invoked [email protected] with errorCode: 429 and uniqueId: N/A Error: [email protected]:  errorCode: 429, uniqueId: N/A
    at Ue (LoginErrorHandler.ts:15:7)
    at r.handleError (withLogin.tsx:104:61)
    at withLogin.tsx:140:14
    at s (runtime.js:63:15)
    at Generator._invoke (runtime.js:293:1)
    at Generator.throw (runtime.js:118:1)
    at i (asyncToGenerator.js:5:1)
    at u (asyncToGenerator.js:31:1)
    at nrWrapper (login?state=hKFo2SBTâ¦Mi4yIn0%3D:56:18441)
r   @   fs.js:4
p   @   sumoLogger.js:10
error   @   index.js:2
Ue  @   LoginErrorHandler.ts:13
(anonymous) @   withLogin.tsx:104
(anonymous) @   withLogin.tsx:140
s   @   runtime.js:63
(anonymous) @   runtime.js:293
(anonymous) @   runtime.js:118
i   @   asyncToGenerator.js:5
u   @   asyncToGenerator.js:31
nrWrapper   @   login?state=hKFo2SBTâ¦oiOS4xMi4yIn0%3D:56
Promise.then (async)        
nrWrapper   @   login?state=hKFo2SBTâ¦oiOS4xMi4yIn0%3D:56
i   @   asyncToGenerator.js:15
c   @   asyncToGenerator.js:27
(anonymous) @   asyncToGenerator.js:34
nrWrapper   @   login?state=hKFo2SBTâ¦oiOS4xMi4yIn0%3D:56
r   @   login?state=hKFo2SBTâ¦oiOS4xMi4yIn0%3D:56
t   @   _export.js:36
(anonymous) @   asyncToGenerator.js:23
(anonymous) @   withLogin.tsx:21
m   @   react-dom.production.min.js:15
w   @   react-dom.production.min.js:15
(anonymous) @   react-dom.production.min.js:16
S   @   react-dom.production.min.js:16
C   @   react-dom.production.min.js:17
k   @   react-dom.production.min.js:17
N   @   react-dom.production.min.js:17
Rn  @   react-dom.production.min.js:85
Fn  @   react-dom.production.min.js:87
t.unstable_runWithPriority  @   scheduler.production.min.js:20
ho  @   react-dom.production.min.js:113
yc  @   react-dom.production.min.js:207
Dn  @   react-dom.production.min.js:86
nrWrapper   @   login?state=hKFo2SBTâ¦oiOS4xMi4yIn0%3D:56

Screenshot of DevTools since I can't embed yet

1

There are 1 best solutions below

6
On

That debug message you saw isn't coming from seleniumbase:


But it is coming from undetected-chromedriver:


Since seleniumbase has it's own modified fork of undetected-chromedriver, it's likely that the script you're running on Linux isn't a seleniumbase one, or you somehow overwrote a folder with an undetected-chromedriver one.

As for avoiding detection on Linux starting with Chrome 120, use headed=True, as that seems to work. (That's how you prevent the default headless mode on Linux in SeleniumBase.) There's some info on that here: https://github.com/seleniumbase/SeleniumBase/issues/2354#issuecomment-1849060186

If you're setting a custom user_data_dir, make sure you haven't used that directory by non-UC_Mode Chrome, as that would mix incompatible configurations.

If you're using a shared IP space such as GitHub Actions, that'll get you detected in UC Mode because lots of IP spaces have been flagged at bot traffic, so sites will throw a CAPTCHA at you, even if they didn't detect Selenium because they can see your IP Address unless you use a proxy server to mask it.

For slower internet connections, you'll need to use driver.uc_open_with_reconnect(URL, reconnect_time=5) to make sure that the driver is still disconnected when the page finishes loading.

If you're using the Driver() format, then use https://github.com/mdmintz/sbVirtualDisplay for Xvfb activation. Eg:

from sbvirtualdisplay import Display
from seleniumbase import Driver

display = Display(visible=0, size=(1440, 1880))
display.start()

driver = Driver(uc=True, headed=True)
driver.uc_open_with_reconnect(URL, reconnect_time=5)
# ...
driver.quit()

display.stop()