Scrapy returning same value over iterating

Question

Scrapy returning same value over iterating

38 Views Asked by Brian Hamilton At 20 November 2023 at 02:26

I am using Scrapy to extract information from a website. My goal is to use Scrapy to pull the name of golf clubs, prices, etc and track cost over the winter and buy what I want, when the price goes down.

So far I have it pulling the club name, but the same name 38 times. (There are 38 clubs on the first page.)

I am wondering why it prints the same name rather than the next name? I am using an example I did in a course, to do this current one. Top set of code is the one from my course, the second is mine.

import scrapy

class Spiderbook0Spider(scrapy.Spider):
    name = "spiderbook0"
    allowed_domains = ["books.toscrape.com"]
    start_urls = ["https://books.toscrape.com"]

def parse(self, response):
    books = response.css('article.product_pod') # Get all the books on the first page
    for book in books: #Get a single book
        print(book.css('h3 a::text').get())

--------------- My Code -----------------

import scrapy


class WedgepriceSpider(scrapy.Spider):
    name = "wedgeprice"
    allowed_domains = ["golftown.com"]
    start_urls = ["https://golftown.com/en-CA/clubs/wedges/"]
 

def parse(self, response):
    wedges = response.css("div.product-tile-top > div.product-image > a.thumb-link ")
    print("***********************************")
    print("***********************************")
    print(wedges)
    for wedge in wedges:
        print(response.xpath("//*[@class = 'name-link']/@title").get())
    print("***********************************")
    print("***********************************")

Original Q&A

There are 1 best solutions below

**Alexander** · Answer 1 · 2023-11-20T05:58:44.480000

This is because in your for loop you are executing the xpath query from the root of the html file on each iteration of the loop.

What you want to do instead would be to first query some parent element that recurs the same number of times as the child that you are trying to print, then in your second expression you can use a relative XPATH expression from the parent to get the value and print it to the terminal.

For example:

import scrapy


class WedgepriceSpider(scrapy.Spider):
    name = "wedgeprice"
    allowed_domains = ["golftown.com"]
    start_urls = ["https://golftown.com/en-CA/clubs/wedges/"]


    def parse(self, response):
        print("***********************************")
        print("***********************************")
        for tile in response.css(".product-tile"):
            print(tile.xpath(".//*[@class = 'name-link']/@title").get())
        print("***********************************")
        print("***********************************")

OUTPUT

2023-11-19 21:55:23 [scrapy.core.engine] INFO: Spider opened
2023-11-19 21:55:23 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2023-11-19 21:55:23 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2023-11-19 21:55:23 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://www.golftown.com/en-CA/clubs/wedges/> from <GET https://golftown.com/en-CA/clubs/wedges/>
2023-11-19 21:55:25 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.golftown.com/en-CA/clubs/wedges/> (referer: None)
***********************************
***********************************
Milled Grind 4 Wedge with Steel Shaft
Glide 4.0 Wedge with Steel Shaft
RTX 4.0 Tour Satin Wedge with Steel Shaft
RTX 6 ZipCore Tour Satin Wedge with Steel Shaft
Milled Grind 3 Black Wedge with Steel Shaft
Milled Grind Wedge with Steel Shaft
JAWS RAW Chrome Wedge with Steel Shafts
Milled Grind 2 Hi-Toe Raw Wedge
RTX 6 ZipCore Black Satin Wedge with Steel Shaft
Mack Daddy Cavity Back Wedge with Steel Shaft
Staff Model Wedge with Steel Shaft
JAWS MD5 Platinum Chrome Wedge with Steel Shaft
Milled Grind 3 Chrome Wedge with Steel Shaft
CBX Full-Face 2 Tour Satin with Steel Shaft
SM9 Brushed Steel Wedge with Steel Shaft
King Cobra Snake Bite Wedge with Steel Shaft
SM9 Tour Chrome Wedge with Steel Shaft
PUR-S Black Wedge with Steel Shaft
JAWS RAW Chrome Wedge with Graphite Shafts
JAWS RAW Black Wedge with Steel Shafts
King Cobra Black Snake Bite Wedge with Steel Shaft
ChipR Wedge with Steel Shaft
T22 Blue Ion Wedge with Steel Shaft
S23 Copper Cobalt Wedge with Steel Shaft
S23 Satin Chrome Wedge with Steel Shaft
Smart Sole 4 S Black Wedge with Graphite Shaft
Smart Sole 4 G Black Wedge with Graphite Shaft
Smart Sole 4 C Black Wedge with Graphite Shaft
Smart Sole 4 S Black Wedge with Steel Shaft
Smart Sole 4 G Black Wedge with Steel Shaft
RTX Full-Face Black Wedge with Steel Shaft
CBX Zipcore Tour Satin Wedge with Graphite Shaft
CBX Zipcore Tour Satin Wedge with Steel Shaft
Women's CBX Zipcore Wedge with Graphite Shaft
Ladies X Act Chipper
***********************************
***********************************
2023-11-19 21:55:25 [scrapy.core.engine] INFO: Closing spider (finished)
2023-11-19 21:55:25 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 724,
 'downloader/request_count': 2,
 'downloader/request_method_count/GET': 2,
 'downloader/response_bytes': 28484,
 'downloader/response_count': 2,
 'downloader/response_status_count/200': 1,
 'downloader/response_status_count/301': 1,
 'elapsed_time_seconds': 2.699416,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2023, 11, 20, 5, 55, 25, 901973),
 'httpcompression/response_bytes': 263357,
 'httpcompression/response_count': 1,
 'log_count/DEBUG': 3,
 'log_count/INFO': 10,
 'response_received_count': 1,
 'scheduler/dequeued': 2,
 'scheduler/dequeued/memory': 2,
 'scheduler/enqueued': 2,
 'scheduler/enqueued/memory': 2,
 'start_time': datetime.datetime(2023, 11, 20, 5, 55, 23, 202557)}

Scrapy returning same value over iterating

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in WEB-SCRAPING

Related Questions in SCRAPY

Related Questions in SCRAPE

Trending Questions

Popular # Hahtags

Popular Questions