Scrapy request get some responses, but not all

296 Views Asked by At

I'm scraping a page that have 36 <@hrefs in the same <@div (bold) xpath, so when i try to get those, even on scrapy shell, it gets only the same 12 <@hrefs all the time, and it's not in order.

i'm using this way: response.xpath('/html/body/div[1]/div[2]/section/div/div[3]/div[2]/div/div[2]//div//article//div[1]// a[re:test(@href,"pd")]//@href').getall()

it's from this following page: https://www.lowes.com/pl/Bottom-freezer-refrigerators-Refrigerators-Appliances/4294789499?offset=36

1

There are 1 best solutions below

0
On BEST ANSWER

Seems that part of the html is dynamically loaded, so scrapy cannot see it. The data itself is present in a json-structure within the html. You can try to get it like this:

import json
# get the script with the data
json_data = response.xpath('//script[contains(text(), "__PRELOADED_STATE__")]/text()').extract_first()
# load the data in a python dictionary
dict_data = json.loads(json_data.split('window.__PRELOADED_STATE__ =')[-1])
items = dict_data['itemList']
print(len(items))  # prints 36 in my case
# go through the dictionary and get the product_urls
for item in items:
  product_url = item['product']['pdURL']
  ...