Scrapy Splash with xpath not returning any results

52 Views Asked by At

The page I'm trying to scrape is https://www.biggerpockets.com/forums/88/topics/895460-cap-rate-vs-interest-rate

The xpath in the developer console returns the text element which corresponds to the title of the post

Developer Console xpath

However, when running the scrapy, the same xpath doesn't work and the title returns 'None'

yield SplashRequest("https://www.biggerpockets.com/forums/88/topics/895460-cap-rate-vs-interest-rate", self.parse_post, args={'wait': 2})

def parse_post(self, response):
  title = response.xpath('//div[contains(@class, "simplified-forums__discussion")]//div[contains(@class, "simplified-forums__discussion__first-post")]//div[contains(@class, "simplified-forums__card__content")]//h1/text()').get()
  print(title)
2023-11-01 00:16:11 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.biggerpockets.com/forums/49/topics/276013-interest-rate>
None

When I access

http://localhost:8050/render.html?url=https://www.biggerpockets.com/forums/88/topics/895460-cap-rate-vs-interest-rate

the page renders fine as well, not sure what exactly is wrong, because I am confident that the xpath is correct.

If I am missing anything, please help me out

1

There are 1 best solutions below

1
Manusha On

As I mentioned in the comment, your xpath seems to be wrong.

import scrapy

class biggerpockets(scrapy.Spider):
    name ='biggerpockets'
    start_urls = ['https://www.biggerpockets.com/forums/88/topics/895460-cap-rate-vs-interest-rate']
    
    def parse(self,response):

        title = response.xpath("//h1[@class='simplified-forums__topic-content__title']/text()").get()
        print("-------Extracted text-----------------")
        print(title)
        print("------------------------")

enter image description here