Scrape the feature image from this website but it returns this `data:image/gif

193 Views Asked by At

Using Scrapy and Scrapy shell in python to scrape the feature image from this website https://www.thrillist.com/travel/nation/all-the-ways-to-cool-off-in-austin but it returns this data:image/gif;base64,R0 instead of src of the image, I need the help of someone if any one tell me the way to fix this to get src of the image

Here is my Code

Feature_Image = [i.strip() for i in response.xpath('//*[@id="main-content"]/article/div/div/div[2]/div[1]/picture/img/@src').getall()][0]
2

There are 2 best solutions below

0
Barry the Platipus On BEST ANSWER

The biggest image on that page would be the one marked (somehow) for Desktop - common sense logic. So why not try to locate its source like below?

pic = response.xpath('//picture[@data-testid="picture-tag"]//source[@data-size="desktop"]/@srcset').get()

Result is the source for the biggest size for that page poster:

https://assets3.thrillist.com/v1/image/3086882/1584x1056/crop;webp=auto;jpeg_quality=60;progressive.jpg
0
Alexander On

It looks like the tag has a data-src attribute that holds the link and some image attributes. Parsing the text and extracting the first section get's you the link.

>>> link = response.xpath("//div[@data-element-type='ParagraphMainImage']//img/@data-src").get().split(";")[0]
>>> link
'https://assets3.thrillist.com/v1/image/3086882/414x310/crop'

You can add manually add .jpg to the end if you want to be able to differentiate what type of image it is. The link works with and without the extension.