I am using Scrapy to extract information from a website. My goal is to use Scrapy to pull the name of golf clubs, prices, etc and track cost over the winter and buy what I want, when the price goes down.
So far I have it pulling the club name, but the same name 38 times. (There are 38 clubs on the first page.)
I am wondering why it prints the same name rather than the next name? I am using an example I did in a course, to do this current one. Top set of code is the one from my course, the second is mine.
import scrapy
class Spiderbook0Spider(scrapy.Spider):
name = "spiderbook0"
allowed_domains = ["books.toscrape.com"]
start_urls = ["https://books.toscrape.com"]
def parse(self, response):
books = response.css('article.product_pod') # Get all the books on the first page
for book in books: #Get a single book
print(book.css('h3 a::text').get())
--------------- My Code -----------------
import scrapy
class WedgepriceSpider(scrapy.Spider):
name = "wedgeprice"
allowed_domains = ["golftown.com"]
start_urls = ["https://golftown.com/en-CA/clubs/wedges/"]
def parse(self, response):
wedges = response.css("div.product-tile-top > div.product-image > a.thumb-link ")
print("***********************************")
print("***********************************")
print(wedges)
for wedge in wedges:
print(response.xpath("//*[@class = 'name-link']/@title").get())
print("***********************************")
print("***********************************")
This is because in your for loop you are executing the xpath query from the root of the html file on each iteration of the loop.
What you want to do instead would be to first query some parent element that recurs the same number of times as the child that you are trying to print, then in your second expression you can use a relative XPATH expression from the parent to get the value and print it to the terminal.
For example:
OUTPUT