Need Help Extracting Redirect URL from a div Element with Specific Class Name in Python Selenium

15 Views Asked by At

I tried this code not working searched all resources please help. It seems like the URL redirection is handled by JavaScript without an tag or an onclick event.

import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By


def SCHLproject(query):
    chrome_options = Options()
    chrome_options.add_argument("--headless")
    chrome_options.add_argument("--disable-gpu")
    driver = webdriver.Chrome(options=chrome_options)

    url = f"https://www.truemeds.in/search/{str(query)}"
    print(driver.page)
    driver.get(url)
    ele = driver.find_element(By.CLASS_NAME, "sc-452fc789-2 eIXiYR")
    ele.click()
    print(driver.current_url)
    
SCHLproject("naxdom")
1

There are 1 best solutions below

0
sashkins On

Indeed, the redirection URL is not specified in some DOM element in your case.

It gets fetched by JS from 'https://services.tmmumbai.in/BotService/fetchUrl?productCode=XXX' on element click.
You have to pass the code of the desired product instead of the 'XXX'.
But the product code is not specified in the DOM either.

It looks like you can extract it from the img src.
Here is the code that worked for me, but it needs to be tested on different inputs:

import re
import requests
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait


def get_product_url(product_code):
    r = requests.get(f"https://services.tmmumbai.in/BotService/fetchUrl?productCode={product_code}")
    return r.json()["URL"]


def SCHLproject(driver, query, timeout=10):
    driver.get(f"https://www.truemeds.in/search/{query}")

    els = WebDriverWait(driver, timeout).until(
        EC.visibility_of_all_elements_located((By.XPATH, "//p/following-sibling::div//div/img"))
    )
    urls = []
    for el in els:
        src = el.get_attribute("src")
        re_res = re.search(r"ProductImage/([^/]+)/", src)
        if re_res:
            product_code = re_res.group(1)
        else:
            raise RuntimeError(f"Product code is not found in the img src: {src}")
        urls.append(get_product_url(product_code))
    return urls


chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.add_argument("--disable-gpu")
driver = webdriver.Chrome(options=chrome_options)

try:
    print(SCHLproject(driver, "naxdom"))
finally:
    driver.quit()

Please note that I've changed the locator to XPath, using classes like 'sc-452fc789-2 eIXiYR' may be tricky sometimes, as such IDs are often generated automatically by the frontend frameworks, which means they may be easily changed.
Suggested XPath also can't be called a 'reliable' locator, but it is definitely better than dynamic classes.

The provided code returns a list of endpoints for the displayed products (it doesn't include the website address itself)