Why I cannot scrape all the data from Zillow?

930 Views Asked by Mr-Sepi0l At 07 February 2023 at 20:55

I'm trying to scrape the data from Zillow (prices) as a practice with Python and I'm not getting the data complete.

This is my code

from jobEntryBot import JobEntryBot
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.by import By
from pprint import pprint
import time
import requests
URL_ZILLOW = r"https://www.zillow.com/homes/for_rent/?searchQueryState=%7B%22pagination%22%3A%7B%7D%2C%22mapBounds%22%3A%7B%22west%22%3A-123.4663871665039%2C%22east%22%3A-121.7744926352539%2C%22south%22%3A37.03952097286371%2C%22north%22%3A38.19687379258651%7D%2C%22isMapVisible%22%3Atrue%2C%22filterState%22%3A%7B%22price%22%3A%7B%22max%22%3A872627%7D%2C%22beds%22%3A%7B%22min%22%3A1%7D%2C%22fore%22%3A%7B%22value%22%3Afalse%7D%2C%22mp%22%3A%7B%22max%22%3A3000%7D%2C%22auc%22%3A%7B%22value%22%3Afalse%7D%2C%22nc%22%3A%7B%22value%22%3Afalse%7D%2C%22fr%22%3A%7B%22value%22%3Atrue%7D%2C%22fsbo%22%3A%7B%22value%22%3Afalse%7D%2C%22cmsn%22%3A%7B%22value%22%3Afalse%7D%2C%22fsba%22%3A%7B%22value%22%3Afalse%7D%7D%2C%22isListVisible%22%3Atrue%2C%22mapZoom%22%3A9%7D"

header = {
    'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/511.22 (KHTML, like Gecko) Chrome/139.3.3.3 Safari/312.311',
    'Accept-Language': 'en-US,en;q=0.9'
}
data = requests.get(headers=header, url=URL_ZILLOW)

soup = BeautifulSoup(data.text, "html.parser")
selector_for_prices = ".gMDnGj span"
prices = soup.select(selector_for_prices)
for price in prices:
    print(price.text)

I try this but **only get 9 prices ** not all the 40 something prices on the webpage.

enter image description here

I've tried using other functions like soup.find_all() but it doesn't work. I've tried even using selenium. If I inspect the Zillow page and use the selector I use in the code it works but not in my code. Pd: I changed the user_agent for the code I show fyi

Original Q&A

There are 1 best solutions below

Übermensch On 07 February 2023 at 21:44 BEST ANSWER

Since the website has web-detection capabilities, you will first need find a way to avoid detection. This post contains a comprehensive list of methods to avoid detection.

It may also be worth looking into the APIs Zillow offers, as it does not seem like there will be a simple way to scrape their website. But if your just doing fun or as a personal learning experience, then it definitely worth take some time to figure out the best approach to scrape Zillow.

Why I cannot scrape all the data from Zillow?

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in WEB-SCRAPING

Related Questions in BEAUTIFULSOUP

Related Questions in ZILLOW

Trending Questions

Popular # Hahtags

Popular Questions