Handling pagination in python playwright

Question

Handling pagination in python playwright

147 Views Asked by Ahmed At 17 October 2025 at 09:13

I am trying to scrape this site https://booking.com with playwright and python but I don't Know how to scrape through multiple result pages ,How do I solve this pagination problem? how can I loop over the page number in the end and scrap the data in other pages Here's my code

from playwright.sync_api import sync_playwright
import pandas as pd

def main():
    with sync_playwright() as p:
        checkin_date = '2023-12-10'
        checkout_date = '2023-12-18'
        page_url= f'https://www.booking.com/searchresults.html?ss=Medina%2C+Saudi+Arabia&label=gog235jc-1DCAEoggI46AdIM1gDaMQBiAEBmAExuAEXyAEP2AED6AEB-AECiAIBqAIDuAL4_M2qBsACAdICJDIyNmY5NThlLTdkNjctNDg2Yi05ZDMzLWY3M2JhZmRkZDdhNtgCBOACAQ&aid=397594&lang=en-us&sb=1&src_elem=sb&src=index&dest_id=-3092186&dest_type=city&checkin={checkin_date}&checkout={checkout_date}&group_adults=1&no_rooms=1&group_children=0&sb_travel_purpose=leisure'
        browser = p.chromium.launch(headless=False)
        page = browser.new_page()
        page.goto(page_url,timeout=6000000)

        hotels = page.locator('//div[@data-testid="property-card"]').all()
        print(f'There are: {len(hotels)} hotels.')

        hotels_list =[]
        for hotel in hotels:
            hotel_dict = {}
            hotel_dict['hotel'] = hotel.locator('//div[@data-testid="title"]').inner_text()
            hotel_dict['price'] = hotel.locator('//span[@data-testid="price-and-discounted-price"]').inner_text()
            hotel_dict['Nights'] = hotel.locator('//div[@data-testid="availability-rate-wrapper"]/div[1]/div[1]').inner_text()
            hotel_dict['tax'] = hotel.locator('//div[@data-testid="availability-rate-wrapper"]/div[1]/div[3]').inner_text()
            hotel_dict['score'] = hotel.locator('//div[@data-testid="review-score"]/div[1]').inner_text()
            hotel_dict['distance'] = hotel.locator('//span[@data-testid="distance"]').inner_text()
            hotel_dict['avg review'] = hotel.locator('//div[@data-testid="review-score"]/div[2]/div[1]').inner_text()
            hotel_dict['reviews count'] = hotel.locator('//div[@data-testid="review-score"]/div[2]/div[2]').inner_text().split()[0]

            hotels_list.append(hotel_dict)

        df = pd.DataFrame(hotels_list)
        df.to_excel('hotels_list.xlsx', index=False) 
        df.to_csv('hotels_list.csv', index=False) 
        browser.close()
        
if __name__ == '__main__':
    main()

Original Q&A

There are 1 best solutions below

**muzzletov** · Answer 1

You can create a new page while youre also able operate on it asynchronously.

So, what you could do is extract the links that you then feed to your newly created pages. For that create an asynchronous function which

processes one page, that is, extracts all the links from the url
returns the links

In the main function, where you have the links at hand you would call that function with the newly created page as a parameter, that you feed one of the urls. Repeat that for all of the links and add the results to a list, await the list. youre done.

Handling pagination in python playwright

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in WEB-SCRAPING

Related Questions in PAGINATION

Related Questions in PLAYWRIGHT-PYTHON

Trending Questions

Popular # Hahtags

Popular Questions