Headless Chromium Browser always displays Captcha

1.5k Views Asked by At

i am using google chrome headless browser with headless-chromium-php

to navigate to some websites but it always detected by captcha

i tried changing user agent randomly using this plugin

but nothing changed

        $UserAgent = \Campo\UserAgent::random([
            'os_type' => 'Windows',
            'device_type' => 'desktop'
        ]);

        $browserFactory = new BrowserFactory('/opt/google/chrome/google-chrome');

        $browser = $browserFactory->createBrowser([
            'sendSyncDefaultTimeout' => 5000,
            'userAgent' => $UserAgent
        ]);
        $page = $browser->createPage();

        $page->navigate($NextURL)->waitForNavigation();

Why is this happening ?

1

There are 1 best solutions below

0
On

I strongly advise you reading both articles below and applying all these techniques on your code to make Chrome Headless detection ever harder

Detecting Chrome headless, new techniques

It is not possible to detect and block Chrome Headless

Headless browser detectors usually makes use of a bunch of techniques to identify if your browser is being remotely controlled, for example, some browser environment properties:

User Agent (old)

It is the attribute commonly used to detect the OS as well as the browser of the user. On a Linux computer with Chrome version 63 it has the following value: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/63.0.3071.115 Safari/537.36

Thus, we can check for the presence of Chrome Headless:

if (/HeadlessChrome/.test(window.navigator.userAgent)) {
    console.log('Chrome headless detected!')
}

Webdriver (new)

In order to automate Chrome headless, a new property webdriver is added to the navigator object (see Chromium code). Thus, by testing if the property is present it is possible to detect Chrome headless.

if (navigator.webdriver) {
    console.log('Chrome headless detected!')
}

Chrome (new)

window.chrome is an object that seems to provide features to Chrome extension developpers. While it is available in vanilla mode, it’s not available in headless mode.

if (!window.chrome) {
    console.log('Chrome headless detected')
}

Permissions (new)

It’s currently not possible to handle permissions in headless mode. Thus, it leads to an inconsistent state where Notification.permission and navigator.permissions.query report contradictory values.

const permissionStatus = await navigator.permissions.query({ name: 'notifications' })
if (Notification.permission === 'denied' && permissionStatus.state === 'prompt') {
    console.log('This is Chrome headless!')
} else {
    console.log('This is not Chrome headless')
}

Plugins (old)

navigator.plugins returns an array of plugins present in the browser. Typically, on Chrome we find default plugins, such as Chrome PDF viewer or Google Native Client. On the opposite, in headless mode, the array returned contains no plugin.

if (navigator.plugins.length === 0) {
    console.log('Chrome headless detected!')
}

Languages (old)

In Chrome two Javascript attributes enable to obtain languages used by the user: navigator.language and navigator.languages. The first one is the language of the browser UI, while the second one is an array of string representing the user’s preferred languages. However, in headless mode, navigator.languages returns an empty string.

if (navigator.languages === '') {
    console.log('Chrome headless detected')
}

WebGL

WebGL is an API to perform 3D rendering in an HTML canvas. With this API, it is possible to query for the vendor of the graphic driver as well as the renderer of the graphic driver.

With a vanilla Chrome and Linux, I obtain the following values for renderer and vendor: "Google SwiftShader" and "Google Inc.". In headless mode, I obtain "Mesa OffScreen", which is the technology used for rendering without using any sort of window system and "Brian Paul", which is the program that started the open source Mesa graphics library.

const canvas = document.createElement('canvas')
const gl = canvas.getContext('webgl')

const debugInfo = gl.getExtension('WEBGL_debug_renderer_info')
const vendor = gl.getParameter(debugInfo.UNMASKED_VENDOR_WEBGL)
const renderer = gl.getParameter(debugInfo.UNMASKED_RENDERER_WEBGL)

if (vendor == 'Brian Paul' && renderer == 'Mesa OffScreen') {
    console.log('Chrome headless detected')
}

Missing image

Finally, our last finding, which also seems to be the most robust, comes from the dimension of the image used by Chrome in case an image cannot be loaded.

In case of a vanilla Chrome, the image has a width and height that depends on the zoom of the browser, but are different from zero. In a headless Chrome, the image has a width and an height equal to zero.

const img = document.createElement('img')
img.src = "http://iloveponeydotcom32188.jg"
document.body.appendChild(img)
img.onerror = () => {
    if (img.width == 0 && img.height == 0) {
        console.log('Chrome headless detected')
    }
}