How to prevent fake useragent detection in selenium headless?

1.8k Views Asked by At

I am running a scraping bot in headless mode. As you know it contains headless string in useragent when it's running in headless mode. To avoid that issue, I changed useragent. And the website detect this fake useragent and block scraping bot. How can I prevent this detection?

I am using selenium chromedriver.

1

There are 1 best solutions below

0
On BEST ANSWER

Please add those options

    # windows_useragent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.104 Safari/537.36"
    # linux_useragent = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36"
    options.add_argument("--disable-blink-features=AutomationControlled")
    options.add_argument("--no-sandbox")
    options.add_argument("user-agent=#{linux_useragent}")
    options.add_argument("--disable-web-security")
    options.add_argument("--disable-xss-auditor")
    options.add_option("excludeSwitches", ["enable-automation", "load-extension"])

navigator.platform and navigator.userAgent should be matched.

If userAgent is for windows, then navigator.platform should be "Win32"

If userAgent is for linux, then navigator.platform should be "Linux x86_64"

You can set like that

platform = {
  windows: "Win32",
  linux: "Linux x86_64"
}
driver.execute_cdp("Page.addScriptToEvaluateOnNewDocument", {
  "source": "
    Object.defineProperty(navigator, 'webdriver', {
      get: () => undefined
    }),
    Object.defineProperty(navigator, 'languages', {
      get: () => ['en-US', 'en']
    }),
    Object.defineProperty(navigator, 'platform', {
      get: () => \"#{platform[:linux]}\"
    })"
})

and of course you need to set navigator.webdriver to undefined