How to wait until all new (dynamic) elements are loaded in selenium webdriver?

90 Views Asked by At

I'm scraping imdb movies' release dates (For example, https://www.imdb.com/title/tt0929632/releaseinfo/). On this website, after the page is fully loaded, only 5 rows of release dates are displayed. In order to get more rows, "show more" buttons need to be clicked on. My goal is to download html that contain full info on release dates, (I'll scrape info needed later)

I use seleinum webdriver because of interactive buttons. The workflow goes:

  1. Get the url, wait until the initial row elements are located
  2. Wait until the button is clickable before clicking on it
  3. Wait until all new row elements are loaded
  4. Download the html

However, I'm stuck at step 3. I'm using the same wait condition as in step 1

all_rows = WebDriverWait(driver, waittime).until(EC.presence_of_all_elements_located((By.XPATH, '//div[@data-testid="sub-section-releases"]//li[@data-testid="list-item"]')))

but it doesn't always work. That is, if I repeat the same code multiple times, the results vary. Sometimes all rows are there in the downloaded html, but sometimes only partially, and other times only 5 rows.

Does anyone know why this happened? I guess it's the wait condition that caused the problem but don't understand why. How can I modify my code to wait until all new row elements are loaded/located, without knowing ahead how many there'll be eventually?

Below is the code (python):

# library
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

# def driver options
def opening_default_driver():
  chrome_options = webdriver.ChromeOptions()
  chrome_options.add_argument('--no-sandbox') 
  chrome_options.add_argument("--disable-dev-shm-usage")
  chrome_options.add_argument('start-maximized')
  chrome_options.add_argument("--disable-infobars")
  chrome_options.add_argument("--disable-extensions")
  chrome_options.add_experimental_option("prefs", { \
      "profile.default_content_setting_values.media_stream_mic": 2, 
      "profile.default_content_setting_values.media_stream_camera": 2,
      "profile.default_content_setting_values.geolocation": 2, 
      "profile.default_content_setting_values.notifications": 2 
    })
  driver = webdriver.Chrome(options=chrome_options)
  return driver

# download the html
driver = opening_default_driver()
waittime = 20
url = 'https://www.imdb.com/title/tt0929632/releaseinfo/'

for t in range(0,20):
    try:
        driver.get(url)
    except Exception as e:
        print(f'timeoutexc {e} getting url: ind, {url}')
    else:
        all_rows = WebDriverWait(driver, waittime).until(
            EC.presence_of_all_elements_located((By.XPATH, '//div[@data-testid="sub-section-releases"]//li[@data-testid="list-item"]'))
        )
        try:
            button2_parent = driver.find_element(By.CSS_SELECTOR, '.ipc-see-more.sc-68fe39e1-0.icyVUF.chained-see-more-button-releases.sc-2e6342b6-1.gXymKs')
            button2 = button2_parent.find_element(By.TAG_NAME, 'button')
        except NoSuchElementException:
            try:
                button1_parent = driver.find_element(By.CSS_SELECTOR, '.ipc-see-more.sc-f06d8e21-0.jBTuow.single-page-see-more-button-releases')
                button1 = button1_parent.find_element(By.TAG_NAME, 'button')
            except NoSuchElementException:
                print(f'can\'t find neither button')
            else:
                button1.click()
        else:
            button2.click()
        finally:
            try:
                all_rows = WebDriverWait(driver, waittime).until(
                    EC.presence_of_all_elements_located((By.XPATH, '//div[@data-testid="sub-section-releases"]//li[@data-testid="list-item"]'))
                )
            except TimeoutException:
                print(f'TimeoutException')
            else:
                f = f'/Users/lalala/Desktop/website{t}.html'
                html = driver.page_source
                with open(f, 'w+', encoding='utf-8') as file:
                    file.write(html)

driver.quit()

1

There are 1 best solutions below

1
On

I figured it out myself in the end @_@ EC.staleness_of(button) solves the problem (at least it seems to be the case)