I am trying to scrape some data from amazon.com using Python and Selenium, and I need to bypass the captchas that appear on some pages. I am using 2captcha service to solve the captchas, but I often get the error ERROR_CAPTCHA_UNSOLVABLE, which means that the captcha could not be solved by the service.
Here is the code that I use to get the captcha token from 2captcha:
from twocaptcha import TwoCaptcha
solver = TwoCaptcha('my_api_key')
site_key = '6Lc_aCMTAAAAABx7u2W0WPXnVbI_v6ZdbM6rYf16' # site key for amazon.com
url = 'https://www.amazon.com/s?k=books' # example url with captcha
try:
result = solver.recaptcha(sitekey=site_key, url=url)
token = result['code']
except Exception as e:
print(e)
And here is the code that I use to apply the token to the captcha element:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
driver.get(url)
WebDriverWait(driver, 10).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR, 'iframe[title="recaptcha challenge"]')))
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.ID, 'recaptcha-token'))).click()
driver.switch_to.default_content()
driver.execute_script('document.getElementById("g-recaptcha-response").innerHTML = "{}";'.format(token))
However, sometimes I get the error ERROR_CAPTCHA_UNSOLVABLE from 2captcha, and I don't know how to handle it. I have tried to refresh the page and get a new captcha token, but it doesn't work. I have also tried to increase the timeout and the polling interval for the solver, but it doesn't help either.
Is there any way to fix this error or avoid it altogether? Are there any alternatives to 2captcha that work better for amazon.com captchas? Any suggestions or advice would be appreciated. Thanks in advance.