How to detect Captcha farms and block Captcha bots

1.3k Views Asked by At

Brief Summary

Let's start with a brief introduction of what a Google reCaptcha farm is - a service that bot developers can query via an API to automate solving Google reCaptcha:

  • The bot is blocked by a Captcha challenge.
  • It makes an API call to the Captcha farm with the website’s Captcha public key & its domain name as parameters.
  • The Captcha farm asks one of its workers to solve the Captcha.
  • After ~30-45 seconds, the Captcha is solved and you obtain its response token.
  • The bot solves the Captcha by submitting the response token.

In short, solving a Captcha is as simple as calling a function in the bot's code. The attacker doesn't even need to interact directly with the Google reCaptcha by clicking on it. If the attackers know the structure and the URL of the Google reCaptcha callback, i.e. the request where the website sends the Google reCaptcha response token after a successful response has been submitted (which is straightforward by looking at the devtools), they can prove that they've solved a Captcha without even using a real browser.

Problem

My website is fully integrated with Google reCaptcha V2 (Invisible reCaptcha). The implementation follows all steps listed in the documentation. It worked like a charm till now. As time passed by, we experienced different kind of attacks that tried to infiltrate our login. The one the caused the biggest problem was a Dictionary attack combined with automated Google reCaptcha solving mechanism. The attackers are using farms (or may be scripts) that solve the Google reCaptcha and generate unique response codes, which are used by a bot network (different IP addresses around the world, User-Agents, Browser Fingerprints, etc.). Using these codes, the Google reCaptcha is taken out of the picture and we MUST use different mechanisms to block the attackers.

Question

I reviewed the Google reCaptcha documentation multiple times along with different topics related to this problem, but couldn't find a way to prevent such attack in an easy way. I have a few questions and will be very grateful if somebody succeeded to answer them:

  • Is it possible to bind the Google reCaptcha response code to a code challenge, cookie or something similar in order to ensure that the code is generated by the exact client?
  • Is there any way to distinguish the Google reCaptcha codes, taken from a farm/script and the ones generated by the exact client?
  • I found that there are some solutions as DataDome, which are very expensive. Is there something similar but on lower price or an algorithm that can be implemented on my own?

Big thanks in advance!

Script

Below is a simplification of the script that acts like a Google reCaptcha farm:

bypassReCaptcha();

function bypassReCaptcha() {
    grecaptcha.render(createPlaceholder(), buildConfiguration());
    grecaptcha.execute();
}

function createPlaceholder() {
    document.body.innerHTML += '<div class="g-recaptcha-hacker"></div>';
    return document.getElementsByClassName('g-recaptcha-hacker')[0];
}

function buildConfiguration() {
    return {
        size: 'invisible',
        badge: 'bottomleft',
        sitekey: '<your site-key>',
        callback: (reCaptchaResponse) => localStorage.setItem('reCaptchaResponse', reCaptchaResponse)
    };
}

I am using a server-side validation - something like this:

curl -X POST 'https://www.google.com/recaptcha/api/siteverify?secret=<your secret>&response=<generated code from above>&remoteip=<client IP address>'

It seems that the remoteip parameter is not working as expected - the validation is successful no matter of the client IP. I checked some topics and seems that this is a common problem:

0

There are 0 best solutions below