How to use python to get information from the map navigation container of a website?

109 Views Asked by At

I am learning Python and hope to get some information from a map navigation container of this website https://findmasa.com/view/map#b1cc410b, such as mural id, latitude, longitude, artist name, address, city, and state.

Below is the code I tried before, but the output is always NO DATA. My coding skill is limited so any help would be sincerely appreciated!

from bs4 import BeautifulSoup
import requests

url = 'https://findmasa.com/view/map#b1cc410b'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')

li_element = soup.find('li', id='b1cc410b')

if li_element:
    data_lat = li_element['data-lat']
    data_lng = li_element['data-lng']
    artist_name = li_element.find('a').text
    address = li_element.find_all('p')[1].text
    city = li_element.find_all('p')[2].text

    print('LATITUDE ', data_lat)
    print('LONGITUDE ', data_lng)
    print('ARTIST ', artist_name)
    print('ADDRESS ', address)
    print('CITY ', city)
else:
    print('NO DATA')
1

There are 1 best solutions below

8
On BEST ANSWER

The information you're looking for gets loaded slowly and involves Javascript. As the requests library doesn't support the javascript, it doesn't return the content/information and thus your if-statement gets False. So, it goes to the else-statement and you get NO DATA.

You may try using Selenium

Here's the solution

from selenium.webdriver import Chrome
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
import selenium.webdriver.support.expected_conditions as EC

# Create a Chrome driver instance
driver = Chrome()

url = 'https://findmasa.com/view/map#b1cc410b'
driver.get(url)

# Wait for the li element with id 'b1cc410b' to be present on the page
li_element = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, 'li#b1cc410b')))

data_lat = li_element.get_attribute('data-lat')
data_lng = li_element.get_attribute('data-lng')
artist_name = li_element.find_element(By.TAG_NAME, 'a').text
address = li_element.find_elements(By.TAG_NAME, 'p')[1].text
city = li_element.find_elements(By.TAG_NAME, 'p')[2].text

# Print the extracted data
print(data_lat)
print(data_lng)
print(artist_name)
print(address)
print(city)

output:

34.102025
-118.32694167
Tristan Eaton
6301 Hollywood Boulevard
Los Angeles, California

You can install selenium using pip:

pip install selenium

[UPDATE]:

  • If the id number is known beforehand, you can easily put it in a variable in order to make the code dynamic using the f-string
  • And to overcome the InvalidSelectorException that you're getting for some url or better to say for some id number, use the notation li[id="id_value"] instead of li#id_value.
    from selenium.webdriver import Chrome
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.wait import WebDriverWait
    import selenium.webdriver.support.expected_conditions as EC
    
    # Create a Chrome driver instance
    driver = Chrome()
    variable_name = '1456a64a' # fdd8a7d5, b1cc410b
    url = f'https://findmasa.com/view/map#{variable_name}'
    driver.get(url)
    
    # Wait for the li element with id 'b1cc410b' to be present on the page
    li_element = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, f'li[id="{variable_name}"]')))
    
    data_lat = li_element.get_attribute('data-lat')
    data_lng = li_element.get_attribute('data-lng')
    artist_name = li_element.find_element(By.TAG_NAME, 'a').text
    address = li_element.find_elements(By.TAG_NAME, 'p')[1].text
    city = li_element.find_elements(By.TAG_NAME, 'p')[2].text
    
    # Print the extracted data
    print(data_lat)
    print(data_lng)
    print(artist_name)
    print(address)
    print(city)
    
    output:
    34.0536722
    -118.3041877
    unknown
    960 South Harvard Boulevard
    Los Angeles, California