Find element by xpath in multiple pages

74 Views Asked by At

I'm testing Selenium for web scraping on a website, but I have a question:

The website contains multiple pages, and the information I need is always within an element with an ID. For example, on page one, I have IDs ranging from "card0" to "card50". However, this pattern repeats on page two, starting again from "card0" and going up to "card50".

I'm trying to locate these elements using "find_element By.XPATH," but I'm having trouble repeating this in a way that works correctly. Here's a snippet of the code:

element = driver.find_element(By.XPATH,"//*[text()[contains(.,id='card')]]")

Thank you all for the support.

2

There are 2 best solutions below

0
On

Assuming your html look like this

<div id="card0">...</div>
<div id="card1">...</div>
<div id="card2">...</div>
<div id="card3">...</div>
<div id="card4">...</div>
<div id="card5">...</div>
...

And you want to get all these div element by using XPATH, you can do it like this

element = driver.find_elements(By.XPATH,'//*[contains(@id, "card")]')

You need to use driver.find_elements(with "s") since you are expecting multiple elements

You can also do this easier using CSS_SELECTOR

element = driver.find_elements(By.CSS_SELECTOR, '[id*="card"]')

So you would typically do this in multiple pages like this

nextPageIsPresent = True
while nextPageIsPresent:
    elements = driver.find_elements(By.CSS_SELECTOR, '[id*="card"]')
    # Do what you wanna do with the elements
    # Check if there is still nextpage

You cannot store these elements in a variable and use it after the loop since it will cause StaleElementReferenceException if you have visited another page.

0
On

here is one i made you might have to inspect element and find CSS selector but other than that it should be good:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
# Set the download directory
prefs = {
    "download.default_directory": r"F:\models",
    "download.prompt_for_download": False
}
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--disable-dev-shm-usage")
# chrome_options.add_argument("--headless")
chrome_options.add_argument("--window-size=1920x1080")
chrome_options.add_argument("--disable-gpu")
chrome_options.add_argument("--disable-extensions")
chrome_options.add_experimental_option("prefs", prefs)
# Launch the Chrome browser
driver = webdriver.Chrome(options=chrome_options)
try:
    # Navigate to the login page
    url = "url goes here"
    driver.get(url)
    # Find and click the login button
    login_button = WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.CSS_SELECTOR, "a.login"))
    )
    login_button.click()
    # Wait for the login form to appear
    login_form = WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.ID, "login_form"))
    )
    # Fill in the email and password fields
    email_input = login_form.find_element(By.CSS_SELECTOR, "input[name='member[email]']")
    email_input.send_keys("email goes here")
    password_input = login_form.find_element(By.CSS_SELECTOR, "input[name='member[password]']")
    password_input.send_keys("password goes here")
    # Click the sign in button
    signin_button = login_form.find_element(By.ID, "signInButton")
    signin_button.click()
    while True:
        try:
            # Find and click the download button
            download_buttons = WebDriverWait(driver, 10).until(
                EC.presence_of_all_elements_located((By.CSS_SELECTOR, "span.gc-icon.gc-icon-download"))
            )
            for button in download_buttons:
                button.click()
            # Find and click the next page button
            next_button = WebDriverWait(driver, 10).until(
                EC.presence_of_element_located((By.CSS_SELECTOR, "li.pagination-next.ng-scope"))
            )
            next_button.click()
        except EC.WebDriverException:
            # If there are no more pages or the buttons are not clickable, break out of the loop
            break
finally:
# Close the browser
    driver.quit()