Extracting Text from Span Tag using BeautifulSoup

738 Views Asked by At

I am trying to extract the estimated monthly cost of "$1,773" from this url:

https://www.zillow.com/homedetails/4651-Genoa-St-Denver-CO-80249/13274183_zpid/

enter image description here

Upon inspecting that part of the page, I see this data:

<div class="sc-qWfCM cdZDcW">
   <span class="Text-c11n-8-48-0__sc-aiai24-0 dQezUG">Estimated monthly cost</span>
   <span class="Text-c11n-8-48-0__sc-aiai24-0 jLucLe">$1,773</span></div>

To extract $1,773, I have tried this:

from bs4 import BeautifulSoup
import requests

url = 'https://www.zillow.com/homedetails/4651-Genoa-St-Denver-CO-80249/13274183_zpid/'
headers = {"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0"}

soup = BeautifulSoup(requests.get(url, headers=headers).content, "html")

print(soup.findAll('span', {'class': 'Text-c11n-8-48-0__sc-aiai24-0 jLucLe'}))

This returns a list of three elements, with no mention of $1,773.

[<span class="Text-c11n-8-48-0__sc-aiai24-0 jLucLe">$463,300</span>, 
<span class="Text-c11n-8-48-0__sc-aiai24-0 jLucLe">$1,438</span>, 
<span class="Text-c11n-8-48-0__sc-aiai24-0 jLucLe">$2,300<!-- -->/mo</span>]

Can someone please explain how to return $1,773?

2

There are 2 best solutions below

5
Artsiom Liaver On

I think you have to find the first parent element. for example:

parent_div = soup.find('div', {'class': 'sc-fzqBZW bzsmsC'})
result = parent_div.findAll('span', {'class': 'Text-c11n-8-48-0__sc-aiai24-0 jLucLe'})

0
NoorJafri On

While parsing a web page we need to separate components of the page in the way they are rendered. There are components that are statically or dynamically rendered. The dynamic content also takes some time to load, as the page calls for backend API of some sort.

Read more here

I tried parsing your page using Selenium ChromeDriver

import time

from selenium import webdriver
from selenium.webdriver.common.by import By

driver = webdriver.Chrome()
driver.get("https://www.zillow.com/homedetails/4651-Genoa-St-Denver-CO-80249/13274183_zpid/")
time.sleep(3)
time.sleep(3)
el = driver.find_elements_by_xpath("//span[@class='Text-c11n-8-48-0__sc-aiai24-0 jLucLe']")

for e in el:
    print(e.text)

time.sleep(3)
driver.quit()

#OUTPUT
$463,300
$1,773
$2,300/mo