Scrapy not able to scrape for the next page

Question

Scrapy not able to scrape for the next page

202 Views Asked by KY Lee At 13 December 2020 at 09:26

I wanted to scrape the information for the following pages, however, the code only allows me to scrape the information from the first page.

My code is as follows:

# -*- coding: utf-8 -*-
import scrapy
from ..items import PropertyItem

class Starprop(scrapy.Spider):
name = 'starprop'
allowed_domains = ['starproperty.com']
start_urls = ['https://www.starproperty.my/to-buy/search?max_price=1000000%2B&new_launch_checkbox=on&sub_sales_checkbox=on&auction_checkbox=on&listing=For%20Sale&sort=latest&page=1']


def parse(self, response):
    item = PropertyItem ()
    property_list = response.css('.mb-4 div')

    for property in property_list:
        property_name = property.css ('.property__name::text').extract()
        property_price = property.css('.property__price::text').extract()
        property_location = property.css ('.property__location::text').extract()
        property_agent = property.css('.property__agentdetails .property__agentdetails span:nth-child(1)::text').extract()
        property_phone = property.css ('.property__agentcontacts a span::text').extract()

        item['property_name']= property_name
        item['property_price']= property_price
        item['property_location'] = property_location
        item['property_agent'] = property_agent
        item['property_phone'] = property_phone

        yield item

        next_page = response.css('.page-item:nth-child(10) .page-link::attr(href)').get()

    if next_page is not None:
        yield response.follow(next_page, callback = self.parse)

Original Q&A

There are 2 best solutions below

Yu Jiaao On 13 December 2020 at 09:36

maybe due to indent? try change:

    yield item

    next_page = response.css('.page-item:nth-child(10) .page-link::attr(href)').get()

if next_page is not None:
    yield response.follow(next_page, callback = self.parse)

to

    yield item

    next_page = response.css('.page-item:nth-child(10) .page-link::attr(href)').get()

    if next_page is not None:
        yield response.follow(next_page, callback = self.parse)

**gangabass** · Accepted Answer · 2020-12-13T09:47:17.867000

That's all about your allowed_domains (but you need to fix your indent too). Also I'm sure that you want to define your item inside your loop:

class Starprop(scrapy.Spider):
    name = 'starprop'
    allowed_domains = ['starproperty.my']
    start_urls = ['https://www.starproperty.my/to-buy/search?max_price=1000000%2B&new_launch_checkbox=on&sub_sales_checkbox=on&auction_checkbox=on&listing=For%20Sale&sort=latest&page=1']


    def parse(self, response):

        property_list = response.css('.mb-4 div')

        for property in property_list:
            property_name = property.css ('.property__name::text').extract()
            property_price = property.css('.property__price::text').extract()
            property_location = property.css ('.property__location::text').extract()
            property_agent = property.css('.property__agentdetails .property__agentdetails span:nth-child(1)::text').extract()
            property_phone = property.css ('.property__agentcontacts a span::text').extract()
            item = PropertyItem ()
            item['property_name']= property_name
            item['property_price']= property_price
            item['property_location'] = property_location
            item['property_agent'] = property_agent
            item['property_phone'] = property_phone

            yield item

        next_page = response.css('.page-item:nth-child(10) .page-link::attr(href)').get()

        if next_page:
            yield response.follow(next_page, callback = self.parse)

Scrapy not able to scrape for the next page

There are 2 best solutions below

Related Questions in PYTHON

Related Questions in WEB-SCRAPING

Related Questions in SCRAPY

Related Questions in SCRAPY-SHELL

Trending Questions

Popular # Hahtags

Popular Questions