I know the webscraping and I have taken the data from different website and I am using python language and selenium webdriver chrome. But I call a website it is open front page and then I click or go any other page then website restrict me and website know that I am using automated chrome.
How can I scrape a website without getting detected and bypassing reCAPTCHA using selenium webdriver through Python?
7.9k Views Asked by Imran Rafiq At
2
There are 2 best solutions below
0
On
These days, websites can detect your program as a BOT pretty easily. Currently Google have 4(four) reCAPTCHA to choose and implement from when creating a new site.
- reCAPTCHA v3
- reCAPTCHA v2 ("I'm not a robot" Checkbox)
- reCAPTCHA v2 (Invisible reCAPTCHA badge)
- reCAPTCHA v2 (Android)
Solution
However there are some generic approaches to avoid getting detected while web-scraping:
- The first and foremost attribute a website can determine your script/program is through your monitor size. So it is recommended not to use the conventional Viewport.
- If you need to send multiple requests to a website keep on changing the User Agent on each request. Here you can find a detailed discussion on Way to change Google Chrome user agent in Selenium?
- To simulate human like behavior you may require to slow down the script execution even beyond WebDriverWait and expected_conditions inducing
time.sleep(secs). Here you can find a detailed discussion on How to sleep webdriver in python for milliseconds
Outro
See:
This may be because the website uses reCAPTCHA v3, which "allows you to verify if an interaction is legitimate without any user interaction". This means that they can identify if you are not a human without asking you to check the famous "I'm not a robot" box. That box is used in the former version of reCAPTCHA, v2.
Read more about reCAPTCHA here: https://developers.google.com/recaptcha/docs/versions
I don't think it's possible to work around this with Selenium. And, as was already mentioned, web scraping is often illegal.