Python: Getting current page url while turning next pages

347 Views Asked by At

I'm witting Python script which extract current page url by going to next page, and extract page url.

I can confirm that the browser is up and connecting to start page. But after that, Nothing will happen.

e.g) start page:

`https://www.jtb.co.jp/kokunai-hotel/list/kyoto/feature/couple_yado/?multiarea=26&dateunspecified=1&page=1`

URL I want extract is following 4 pages:

・https://www.jtb.co.jp/kokunai-hotel/list/kyoto/feature/couple_yado/?multiarea=26&dateunspecified=1&page=1

・https://www.jtb.co.jp/kokunai-hotel/list/kyoto/feature/couple_yado/?multiarea=26&dateunspecified=1&page=2

・https://www.jtb.co.jp/kokunai-hotel/list/kyoto/feature/couple_yado/?multiarea=26&dateunspecified=1&page=3

・https://www.jtb.co.jp/kokunai-hotel/list/kyoto/feature/couple_yado/?multiarea=26&dateunspecified=1&page=4

I wrote script as below.

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.common.exceptions import NoSuchElementException
import pandas as pd
from time import sleep
import time

 
options = Options()
driver = webdriver.Chrome('path',options=options)


pageURL = 'https://www.jtb.co.jp/kokunai-hotel/list/kyoto/feature/couple_yado/'
driver.get(pageURL)
sleep(3)


elem_urls = []


while True:
    url = driver.current_url
    
    for urls in url:
        elem_urls.append(urls)
    
    try:
        next_button = driver.find_elemenent_by_class_name('f-list-paging__next')
        next_button.click()
        sleep(3)
        
    except Exception:
        break
1

There are 1 best solutions below

0
On

To extract the links for the pages you need to induce WebDriverWait for visibility_of_all_elements_located() and you can use either of the following locator strategies:

  • Using CSS_SELECTOR:

    driver.get('https://www.jtb.co.jp/kokunai-hotel/list/kyoto/feature/couple_yado/')
    print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "a.f-list-paging-num__link")))])
    
  • Using XPATH:

    driver.get('https://www.jtb.co.jp/kokunai-hotel/list/kyoto/feature/couple_yado/')
    print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//a[contains(@class, 'f-list-paging-num__link')]")))])
    
  • Console Output:

    ['https://www.jtb.co.jp/kokunai-hotel/list/kyoto/?page=1', 'https://www.jtb.co.jp/kokunai-hotel/list/kyoto/?page=2', 'https://www.jtb.co.jp/kokunai-hotel/list/kyoto/?page=3', 'https://www.jtb.co.jp/kokunai-hotel/list/kyoto/?page=4']
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC