Zillow Web Scraping using Selenium PXCaptcha

725 Views Asked by At

I am trying to do a project using Selenium which gets to Zillow to find homes for rent and return their properties i.e. renting link, price and address.

This is my code:

from selenium import webdriver
from selenium.webdriver.common.by import By

driver = webdriver.Chrome(executable_path=CHROME_DRIVER_PATH)

driver.get(ZILLOW_HOUSES_URL)

house_links = driver.find_elements(By.CSS_SELECTOR, LINKS_CSS_SELECTOR)
prices = driver.find_elements(By.CSS_SELECTOR, PRICES_CSS_SELECTOR)
addresses = driver.find_elements(By.CSS_SELECTOR, ADDRESSES_CSS_SELECTOR)

for link in house_links:
    print(link.get_attribute('href'))
for price in prices:
    print(price.text.split('+')[0].split(', ')[0].split('/')[0])
for address in addresses:
    print(address.text)

Mostly when I run it, it does go to the Zillow webpage, but this CaptchaPX thing comes up. I press and hold, but it comes up again saying Try Again. I try it again, it doesn't stop. How to get rid of this?

1

There are 1 best solutions below

0
QuentinJS On

You need to make sure cookies can be saved. This got me passed the CAPTCHA for me. It has to be a fully qualified path or Chrome complains.

sel_path = os.path.join(os.getcwd(), 'selenium')
chrome_options = Options()
chrome_options.add_argument("user-data-dir="+ sel_path)
chrome_options.add_argument("user-data-dir=selenium") 
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.get(zillow_path)