how to bypass cloudflare with python

22.7k Views Asked by At

I am unable to scrape this website https://www.mentalhealthforum.net/, I am getting a 403 status code, even though I've tried every available solution on the internet. Cloudflare has h-captcha protection, therefore it is more complex to bypass it

here is my code

def scrape(self):
    baseurl = 'https://www.mentalhealthforum.net/'
    scraper = cloudscraper.create_scraper(delay=10,
                                        browser={
                                                'browser': 'chrome',
                                                'platform': 'android',
                                                'desktop': False
                                                },
                                        debug=True, 
                                        captcha={'provider': '2captcha',
                                                 'api_key': api_key})
    response = scraper.get(baseurl)
    return response.status_code

print(scrape())  

output:

< GET / HTTP/1.1
< Host: www.mentalhealthforum.net
< User-Agent: Mozilla/5.0 (Linux; Android 4.3; SM-G710 Build/JLS36C) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.111 Mobile Safari/537.36
< Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8
< Accept-Language: en-US,en;q=0.9
< Accept-Encoding: gzip, deflate
<

> HTTP/1.1 403 Forbidden
> Date: Thu, 04 Aug 2022 04:44:23 GMT
> Content-Type: text/html; charset=UTF-8
> Transfer-Encoding: chunked
> Connection: close
> CF-Chl-Bypass: 1
> Permissions-Policy: accelerometer=(),autoplay=(),camera=(),clipboard-read=(),clipboard-write=(),fullscreen=(),geolocation=(),gyroscope=(),hid=(),interest-cohort=(),magnetometer=(),microphone=(),payment=(),publickey-credentials-get=(),screen-wake-lock=(),serial=(),sync-xhr=(),usb=()
> Cache-Control: private, max-age=0, no-store, no-cache, must-revalidate, post-check=0, pre-check=0
> Expires: Thu, 01 Jan 1970 00:00:01 GMT
> X-Frame-Options: SAMEORIGIN
> Expect-CT: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
> Vary: Accept-Encoding
> Strict-Transport-Security: max-age=15552000; preload
> Server: cloudflare
> CF-RAY: 7354a33748c83384-DEL
> Content-Encoding: gzip
> alt-svc: h3=":443"; ma=86400, h3-29=":443"; ma=86400
>
<!DOCTYPE html>
<!--[if lt IE 7]> <html class="no-js ie6 oldie" lang="en-US"> <![endif]-->
<!--[if IE 7]>    <html class="no-js ie7 oldie" lang="en-US"> <![endif]-->
<!--[if IE 8]>    <html class="no-js ie8 oldie" lang="en-US"> <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js" lang="en-US"> <!--<![endif]-->
<head>

<title>Attention Required! | Cloudflare</title>

<meta charset="UTF-8" />
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta http-equiv="X-UA-Compatible" content="IE=Edge" />
<meta name="robots" content="noindex, nofollow" />
<meta name="viewport" content="width=device-width,initial-scale=1" />
<link rel="stylesheet" id="cf_styles-css" href="/cdn-cgi/styles/cf.errors.css" />
<!--[if lt IE 9]><link rel="stylesheet" id='cf_styles-ie-css' href="/cdn-cgi/styles/cf.errors.ie.css" /><![endif]-->
<style>body{margin:0;padding:0}</style>


<!--[if gte IE 10]><!-->
<script>
  if (!navigator.cookieEnabled) {
    window.addEventListener('DOMContentLoaded', function () {
      var cookieEl = document.getElementById('cookie-alert');
      cookieEl.style.display = 'block';
    })
  }
</script>
<!--<![endif]-->


    <script>
    //<![CDATA[
    (function(){
      window._cf_chl_opt={
        cvId: "2",
        cType: "interactive",
        cNounce: "78912",
        cRay: "7354a33748c83384",
        cHash: "9d961b5f9b4ebe8",
        cUPMDTk: "\/?__cf_chl_tk=PkVh2nHuDkM8GSKSBdN6bF6yQ4tTFPfmUCVY6Zc6tQA-1659588263-0-gaNycGzNB_0",
        cFPWv: "b",
        cTTimeMs: "1000",
        cLt: "n",
        cRq: {
          ru: "aHR0cHM6Ly93d3cubWVudGFsaGVhbHRoZm9ydW0ubmV0Lw==",
          ra: "TW96aWxsYS81LjAgKExpbnV4OyBBbmRyb2lkIDQuMzsgU00tRzcxMCBCdWlsZC9KTFMzNkMpIEFwcGxlV2ViS2l0LzUzNy4zNiAoS0hUTUwsIGxpa2UgR2Vja28pIENocm9tZS82My4wLjMyMzkuMTExIE1vYmlsZSBTYWZhcmkvNTM3LjM2",
          rm: "R0VU",
          d: "cwDRskjJag43bMKn7QRhwyi8kyHqreuRwnGo+sqgbfN4uUqgwuI5Uv1VkkzWsGgvouW5wanxEIPAqrWZ7vK+KBXMwthn82Mzg2/gQpF36BPJJpPvfBg+vEE72VRJczxt02ALraAJiHgJW16MZfyPgjypbMsaCMt3lnB/3EWgzwkaeOtwJFzc7Wg5WN5RyuJNtXjZBYmU0LZVK9WYSYnyNQlZ0Mf5t7S+Y+ZTr8P5Z97W0VD12aSiHnFdXNUmAOWSOEOAxMi4a2F2U3O/kbEYsef1ouYIxKT9Nnmsw3mW2qbdnhOC24wIODeYC6DvHr5jZxRyFik3AdxHrtcKBRfLVLkvaiX6fkTdTlLMJ94p4hb8OYgh3r7qoAXyDX9gKf0pNGwF8BN6oFVMxauL+L9/Q+tXbSs5zWN3GZFe7XYcKQLMHXnrcw5s+WfCYwEkUzL0qoCg4B+JnQxF18GTXsXhhLmvDF00q71Fp3EzyBxZX54UELtPdu+IJMfo5uwb+Z62wDqWVYOQ9KDfUn9sJLl9xCFiN/gQNoyG9dgXrf9OmaxkQfEczBKa2lfAUu8a2CloY8qkGVHk55mg8SPrS2T09g==",
          t: "MTY1OTU4ODI2My41NzMwMDA=",
          m: "KejW80FyaOkUxmM47SXOqGP/cB+YHYlrIsoq37bq8Zs=",
          i1: "yDPbyhvL8j2VIY5Bqln6sg==",
          i2: "wetyBqKqvU25YTvZJoE21g==",
          zh: "OxIRYgLHg5p2pcbMMkuwgcVYeS4WO2VJlFLKmTgWwgg=",
          uh: "4vBxYA3Nh/bTpvXjoeGamwkVevjGpPRbEVqG2Joz1JM=",
          hh: "uGdsbGXsZlcoz7a5joNqzoj1ka1E1MNME2WxnV/IMIU=",
        }
      };
    }());
    //]]>
    </script>

<style>
  #cf-wrapper #spinner {width:69px; margin:  auto;}
  #cf-wrapper #cf-please-wait{text-align:center}
  .attribution {margin-top: 32px;}
  .bubbles { background-color: #f58220; width:20px; height: 20px; margin:2px; border-radius:100%; display:inline-block; }
  #cf-wrapper #challenge-form { padding-top:25px; padding-bottom:25px; }
  #cf-hcaptcha-container { text-align:center;}
  #cf-hcaptcha-container iframe { display: inline-block;}
  @keyframes fader     { 0% {opacity: 0.2;} 50% {opacity: 1.0;} 100% {opacity: 0.2;} }
  #cf-wrapper #cf-bubbles { width:69px; }
  @-webkit-keyframes fader { 0% {opacity: 0.2;} 50% {opacity: 1.0;} 100% {opacity: 0.2;} }
  #cf-bubbles > .bubbles { animation: fader 1.6s infinite;}
  #cf-bubbles > .bubbles:nth-child(2) { animation-delay: .2s;}
  #cf-bubbles > .bubbles:nth-child(3) { animation-delay: .4s;}
</style>
</head>
<body>
  <div id="cf-wrapper">
    <div class="cf-alert cf-alert-error cf-cookie-error" id="cookie-alert" data-translate="enable_cookies">Please enable cookies.</div>
    <div id="cf-error-details" class="cf-error-details-wrapper">
      <div class="cf-wrapper cf-header cf-error-overview">

        <h1 data-translate="challenge_headline">One more step</h1>
        <h2 class="cf-subheadline"><span data-translate="complete_sec_check">Please complete the security check to access</span> www.mentalhealthforum.net</h2>

      </div>

      <div class="cf-section cf-highlight cf-captcha-container">
        <div class="cf-wrapper">
          <div class="cf-columns two">
            <div class="cf-column">

              <div class="cf-highlight-inverse cf-form-stacked">
                <form id="challenge-form" class="challenge-form interactive-form" action="/?__cf_chl_f_tk=PkVh2nHuDkM8GSKSBdN6bF6yQ4tTFPfmUCVY6Zc6tQA-1659588263-0-gaNycGzNB_0" method="POST" enctype="application/x-www-form-urlencoded">
    <div id='cf-please-wait'>
      <div id='spinner'>
        <div id="cf-bubbles">
            <div class="bubbles"></div>
            <div class="bubbles"></div>
            <div class="bubbles"></div>
        </div>
      </div>
      <p data-translate="please_wait" id="cf-spinner-please-wait">Please stand by, while we are checking your browser...</p>
      <p data-translate="redirecting" id="cf-spinner-redirecting" style="display:none">Redirecting...</p>
      </div>
  <input type="hidden" name="md" value="oABL_BcWHmbxrANJi_DEsW4NSdZTow94pQNG.PPLozI-1659588263-0-AU8nZDIX4CuayrBCQWeR-rw7pdmEvwF9aAPyDTGsjUhKgb7lYEb4LbGEoGFg0yNqGuYBaPZJx_mvExqV9d-Hv4RYJZPBlsg53WXtldbxe2eNaTPPq7GiiSebI_x96CnQnwE36UsOwNvSV1WXGxqPrSf4YBqy59R5AU8TyGB-jnDZ549Kttr5AfJZMgLYXxuvxVfZ3H-d_V4-s22D-mdcyMwfvYf_SBd5ZD78UI23FYCHRX5pq2CbRfe1ntQ3gQSdD-6-JNASzzuXrHmbi1NrWLgx4bogZkUzkbjkSu2HmqIGq9-yWch8I12m2Osd1dYBE2VO13Xbmd73RHLaG1TPpKOBQf7izCaW4OUsdeEFHR0BfN7Z3b2b2JhfVHxT9RvUTr8xry2xF6OJ1MlG8GuHAImRAitEZJJKvue28KIWNR-VqS6BdaoFD-1C570aq_yegVKtV--50af-s_VBazSMECgS7usN9s4wA_piaOVH4TE2zK3kJZh7qeZlPa5GE6n5SEIhx6K7vg-uBXMMgzddGptCWz8zP9Kj8XLupIg_1Oi9_R3564djA04BCq98_9FHfW4DrYAyOodluZ8XHyNTQrYeTgP-N--GxNSEtyEHgw4tnw872nrFXNTtZkuPIoo6RA" />
  <input type="hidden" name="r" value="2RFjKAqAJ1wEDp7ctGkSzsKBpY.5nrLffwgE0dPn5So-1659588263-0-ARP6jAuzOyVp//1QkUdxc88mwa4hXj7KBBTY3yZEIYCNUeU/EWTep9CqlrAXm9nncXKVQNpiRHV5an6wE13xfCU0q7InPSRt170NvoHvcCiz5KsUgDEOxbgn3yr5n6B97g6TPJspvIF0p/+GIKd8yHpRIkIDNhlq/wszxqhvZ8GEyzBrZV97rKecUdzUhQ3wY6xPcJqY1cUB9vii/wac9GZ3GcaFs0oLKPWf6sZ6MN3q6inek4ahRfogGCTeNtWDp82pLgWMIofs6CdKZRA4NBdPnQvpB+OCAmd5ueuUVjfpNfrbCeqxN5TNbtThtbv9g22zA1SxYWP/CGtji2CuqLaaxjqsNUxoBQeLL9ERYy9qdhSvsTkmdwNpohoRo9nPsXAC09jGQ6GFSMkMEFL7OIkKIn9RZ7ttj+/OfMEH8kU8kGDANLze92S4EQLLUrSmiWoatPiOpxNfRgYq2DAY6HhgC87GMvfecjLlywtzmp9poC9OAnzdLVWGShxngN3KFj5BcIpIBswuQ1aC3n9UXjuhc+0rEbF2a+BOKjopzuH+njzJNjusseuxP7kC3+GZcb8Wc3OZNOd1CQhIWvnU5Y+Pn1E4myJLxFVIKqGn7KVhBY54oWULfMg9vF2jPFaojY1XDPKUbMGJed+VFQfVePwIc3+wikeeemM/I/2JjrxzxsbV8AP1YEoRpLhZObWa0p7wMmhT2gicvPPRaTkOgW3P4LKIFK1S7/gp48ENoxnpgOYdLbH6zsawjEMisg9K0WmGo9WzFlLhW6f1RzcRtM/kIs4n3aYdUzRnpHdRaAkjPyLVhEPG3gVAVLpIx3PmdMRgE2AylrikXjzG8L5CJcUKdE0KVOKYVWk68gEb13KAx801yQvQ4oOXkm/iH8hKljzUcGCCKWE0gocwxUtd3DleTo+vYdHdps08aHJrZ5Mqh+QsaCB8aIZVhPnJbLe7TMyFE61sgkuCaPUs+gF9AzNPB/gfqWsmjGXE7xDqKNNo4yGU6iKRfooKmMJVJ8lxwjF8XtlXkHcMmni6dT2AnbRVCGfRzt1ETTpjZkLAzcWXxg+5IrO/Xt4Nll6qMBz6ZIwPt8K9uh6QBKe76WMQd5szwqXnGXROF+mOg70Ro+H5roF610FiCEv7oEKdZ//AOSWRSfcYGznkrHjbhChx5pnjaQNJn72sAIGpe0w5XxHKdVssfpniEul0xeUihigo4JLBMLS93wmhQtuYHkxvgjCGz2QPI+iW08BKgda+s7277/vDLLTZnBWNvRWKbSYL6t7ZUb+y6zDlxnMrl8MBc3TKJcDdKBWc1DACottU8M9FFAohTYTKklygxYhbkS3Xrzs2wIQfPUPzwMKsTLdOuB+L/vPc8jWIIYQs8ca11+pge+WcN8ZQ/mRZLns1bno3bczan31UKKGs2/BFgvq4gd8HSEfchAzvKyL2nlihsNrGaehdFx2vavvi1XmQPs2oPZEXJx0GaSfuxKxlGetWDBoRt1Auwna7UXdt2Rxzx/HFeiUxNaNhDkIT02kvqewSTLt2ZoJifnQPADmx/88ek+PoZQZelHeFjZN1y57U3i38jfpmYmm5Yw8uXIb5Z/iXcK/UCab0+/wfZAhcuu421vkoakSuzI+bmMbj6IqLSD/zkaFDL8wSGCtLnsZ0rMNMScj2/f9jMescJYeF/2VAq/1vlH/93yuR4KJXA4MWfk6s49lOaxjv8Vh6WU8rRTh4YrimPtKi/BUPgG+0BC95nmwiPcjrtrO8C52ITotpWCmLq6tIBICpct8XE3kCERA0kOy038JHrCXMvBhA3MucRyrjfa5J8gViq7sePQc/P4bGqUsrOrAFm0O1Bp6zzn/Zp7xQGTbU5a+ZFjsdZHUv6ax3zys4vmxVTMI0OPnoxRx7q9pcqGI0SgM5HPPIK80GLchJ2DBPxVNNtTdayu9GUWwCZOb8tXmg2pTjHDR24Oft46ccM46p8Y/ekkJfRC2vCw==">
  <noscript id="cf-captcha-bookmark" class="cf-captcha-info">
  <h1 data-translate="turn_on_js" style="color:#bd2426;">Please turn JavaScript on and reload the page.</h1>
  </noscript>
    <div id="no-cookie-warning" class="cookie-warning" data-translate="turn_on_cookies" style="display:none">
      <p data-translate="turn_on_cookies" style="color:#bd2426;">Please enable Cookies and reload the page.</p>
    </div>
  <script>
  //<![CDATA[
    var a = function() {try{return !!window.addEventListener} catch(e) {return !1} },
      b = function(b, c) {a() ? document.addEventListener("DOMContentLoaded", b, c) : document.attachEvent("onreadystatechange", b)};
      b(function(){
        var cookiesEnabled=(navigator.cookieEnabled)? true : false;
        if(!cookiesEnabled){
          var q = document.getElementById('no-cookie-warning');q.style.display = 'block';
        }
      });
  //]]>
  </script>
  <div id="trk_captcha_js" style="background-image:url('/cdn-cgi/images/trace/captcha/nojs/transparent.gif?ray=7354a33748c83384')"></div>
</form>
  <script>
    //<![CDATA[
    (function(){
        var isIE = /(MSIE|Trident\/|Edge\/)/i.test(window.navigator.userAgent);
        var trkjs = isIE ? new Image() : document.createElement('img');
        trkjs.setAttribute("src", "/cdn-cgi/images/trace/captcha/js/transparent.gif?ray=7354a33748c83384");
        trkjs.id = "trk_captcha_js";
        trkjs.setAttribute("alt", "");
        document.body.appendChild(trkjs);
        var cpo=document.createElement('script');
        cpo.type='text/javascript';
        cpo.src = '/cdn-cgi/challenge-platform/h/b/orchestrate/captcha/v1?ray=7354a33748c83384';

        window._cf_chl_opt.cOgUHash = location.hash === '' && location.href.indexOf('#') !== -1 ? '#' : location.hash;
        window._cf_chl_opt.cOgUQuery = location.search === '' && location.href.slice(0, -window._cf_chl_opt.cOgUHash.length).indexOf('?') !== -1 ? '?' : location.search;
        if (window._cf_chl_opt.cUPMDTk && window.history && window.history.replaceState) {
          var ogU = location.pathname + window._cf_chl_opt.cOgUQuery + window._cf_chl_opt.cOgUHash;
          history.replaceState(null, null, "\/?__cf_chl_rt_tk=PkVh2nHuDkM8GSKSBdN6bF6yQ4tTFPfmUCVY6Zc6tQA-1659588263-0-gaNycGzNB_0" + window._cf_chl_opt.cOgUHash);
          cpo.onload = function() {
            history.replaceState(null, null, ogU);
          };
        }

        document.getElementsByTagName('head')[0].appendChild(cpo);
    }());
    //]]>
    </script>


              </div>
            </div>

            <div class="cf-column">
              <div class="cf-screenshot-container">

                <span class="cf-no-screenshot"></span>

              </div>
            </div>
          </div>
        </div>
      </div>

      <div class="cf-section cf-wrapper">
        <div class="cf-columns two">
          <div class="cf-column">
            <h2 data-translate="why_captcha_headline">Why do I have to complete a CAPTCHA?</h2>

            <p data-translate="why_captcha_detail">Completing the CAPTCHA proves you are a human and gives you temporary access to the web property.</p>
          </div>

          <div class="cf-column">
            <h2 data-translate="resolve_captcha_headline">What can I do to prevent this in the future?</h2>


            <p data-translate="resolve_captcha_antivirus">If you are on a personal connection, like at home, you can run an anti-virus scan on your device to make sure it is not infected with malware.</p>

            <p data-translate="resolve_captcha_network">If you are at an office or shared network, you can ask the network administrator to run a scan across the network looking for misconfigured or infected devices.</p>


              <p data-translate="resolve_captcha_privacy_pass">Another way to prevent getting this page in the future is to use Privacy Pass. Check out the browser extension in the <a rel="noopener noreferrer" href="https://chrome.google.com/webstore/detail/privacy-pass/ajhmfdgkijocedmfjonnpjfojldioehi">Chrome Web Store</a>.</p>


          </div>
        </div>
      </div>


      <div class="cf-error-footer cf-wrapper w-240 lg:w-full py-10 sm:py-4 sm:px-8 mx-auto text-center sm:text-left border-solid border-0 border-t border-gray-300">
  <p class="text-13">
    <span class="cf-footer-item sm:block sm:mb-1">Cloudflare Ray ID: <strong class="font-semibold">7354a33748c83384</strong></span>
    <span class="cf-footer-separator sm:hidden">&bull;</span>
    <span id="cf-footer-item-ip" class="cf-footer-item hidden sm:block sm:mb-1">
      Your IP:
      <button type="button" id="cf-footer-ip-reveal" class="cf-footer-ip-reveal-btn">Click to reveal</button>
      <span class="hidden" id="cf-footer-ip">49.36.219.70</span>
      <span class="cf-footer-separator sm:hidden">&bull;</span>
    </span>
    <span class="cf-footer-item sm:block sm:mb-1"><span>Performance &amp; security by</span> <a rel="noopener noreferrer" href="https://www.cloudflare.com/5xx-error-landing" id="brand_link" target="_blank">Cloudflare</a></span>

  </p>
  <script>(function(){function d(){var b=a.getElementById("cf-footer-item-ip"),c=a.getElementById("cf-footer-ip-reveal");b&&"classList"in b&&(b.classList.remove("hidden"),c.addEventListener("click",function(){c.classList.add("hidden");a.getElementById("cf-footer-ip").classList.remove("hidden")}))}var a=document;document.addEventListener&&a.addEventListener("DOMContentLoaded",d)})();</script>
</div><!-- /.error-footer -->


    </div>
  </div>

  <script>
  window._cf_translation = {};


</script>


</body>
</html>

403

Does anyone know how to solve this problem?

4

There are 4 best solutions below

0
On

I came across this post looking for a similar solution myself. I actually found an answer after reading this thread and not being satisfied with the answers (certainly they did not solve the issue I had).

Zenrows appears to be a good solution when you have v2 protocols in place. It's one of those services with tiers but if you only need to occasionally scrape then the free API key you can request will solve this.

I tested on this site and I don't get 403'd:

import requests 

response = requests.get("https://api.zenrows.com/v1/?apikey=YOUR_API_KEY&url=https%3A%2F%2Fwww.mentalhealthforum.net%2F") 

print(response.text)

Here was where I generated a free API key: https://app.zenrows.com/builder

Let me know how this works.

0
On

I was using selenium in python, when ever it seemed like the site will require a captcha to verify if you are human, I just closed and re-opened the browser, and I was able to visit that site and scrape. Hope this helps someone.

0
On

it's hard to bypass Cloudflare as a whole as it is constantly updating. You could attempt at solving it, however, that would require quiet good reverse-engineering skills. If the website you're attempting to scrape does not need to be very fast. I'd recommend going with a browser-based solution. Something like playwright or selenium. If you do want to "solve" cloudflare, you will have to do quiet a bit of background research on reverse-engineering, TLS, ciphers, and the alike.

0
On

There's a StackOverflow tag, https://stackoverflow.com/questions/tagged/undetected-chromedriver (with over 250 questions already) that's dedicated to a Python repo (https://github.com/ultrafunkamsterdam/undetected-chromedriver) with the sole purpose of bypassing Cloudflare. There, you will find the answers you seek.

In addition to that, there are other Python frameworks, such as https://github.com/seleniumbase/SeleniumBase, that also have the ability to bypass CAPTCHAs.

After pip install seleniumbase, try running this script with python, which opens a URL protected by Cloudflare:

from seleniumbase import Driver

driver = Driver(uc=True)
driver.get("https://www.g2.com/categories/video")
driver.sleep(3)
driver.quit()

For slower internet connections, you may need to adjust one line to evade detection:

from seleniumbase import Driver

driver = Driver(uc=True)
driver.uc_open_with_reconnect("https://www.g2.com/categories/video", 4)
driver.sleep(3)
driver.quit()