i am using google chrome headless browser with headless-chromium-php
to navigate to some websites but it always detected by captcha
i tried changing user agent randomly using this plugin
but nothing changed
$UserAgent = \Campo\UserAgent::random([
'os_type' => 'Windows',
'device_type' => 'desktop'
]);
$browserFactory = new BrowserFactory('/opt/google/chrome/google-chrome');
$browser = $browserFactory->createBrowser([
'sendSyncDefaultTimeout' => 5000,
'userAgent' => $UserAgent
]);
$page = $browser->createPage();
$page->navigate($NextURL)->waitForNavigation();
Why is this happening ?
Headless browser detectors usually makes use of a bunch of techniques to identify if your browser is being remotely controlled, for example, some browser environment properties:
User Agent (old)
It is the attribute commonly used to detect the OS as well as the browser of the user. On a Linux computer with Chrome version 63 it has the following value:
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/63.0.3071.115 Safari/537.36
Thus, we can check for the presence of Chrome Headless:
Webdriver (new)
In order to automate Chrome headless, a new property webdriver is added to the navigator object (see Chromium code). Thus, by testing if the property is present it is possible to detect Chrome headless.
Chrome (new)
window.chrome
is an object that seems to provide features to Chrome extension developpers. While it is available in vanilla mode, it’s not available in headless mode.Permissions (new)
It’s currently not possible to handle permissions in headless mode. Thus, it leads to an inconsistent state where Notification.permission and navigator.permissions.query report contradictory values.
Plugins (old)
navigator.plugins
returns an array of plugins present in the browser. Typically, on Chrome we find default plugins, such as Chrome PDF viewer or Google Native Client. On the opposite, in headless mode, the array returned contains no plugin.Languages (old)
In Chrome two Javascript attributes enable to obtain languages used by the user: navigator.language and navigator.languages. The first one is the language of the browser UI, while the second one is an array of string representing the user’s preferred languages. However, in headless mode, navigator.languages returns an empty string.
WebGL
WebGL is an API to perform 3D rendering in an HTML canvas. With this API, it is possible to query for the vendor of the graphic driver as well as the renderer of the graphic driver.
With a vanilla Chrome and Linux, I obtain the following values for renderer and vendor: "Google SwiftShader" and "Google Inc.". In headless mode, I obtain "Mesa OffScreen", which is the technology used for rendering without using any sort of window system and "Brian Paul", which is the program that started the open source Mesa graphics library.
Missing image
Finally, our last finding, which also seems to be the most robust, comes from the dimension of the image used by Chrome in case an image cannot be loaded.
In case of a vanilla Chrome, the image has a width and height that depends on the zoom of the browser, but are different from zero. In a headless Chrome, the image has a width and an height equal to zero.