How to select specific class with Scrapy

94 Views Asked by At

I am trying to scrape a page that contains specific info. The url:https://www.artisans-du-batiment.com/trouver-un-artisan-qualifie/?job=Charpentier&place=35000%2F35900 I want to select a class for each carpenter, so I try response.css('div.a-artisanTease to-animate'), but it gives no selection. What might be the problem?

Thanks.

I've tried several different paths. I need the scrapy to select all separate carpenters that are on the page, so I can later collect info for all search results

2

There are 2 best solutions below

1
On

The reason you can't retrieve data with Scrapy is that this webpage is written in JavaScript. Scrapy cannot assist you in this case. You need to use a library that can handle JavaScript, such as Selenium or Splash, to retrieve data from this webpage.

I recommend using XPath selectors instead of CSS selectors, as XPath offers many useful options for searching text in the DOM. The equivalent XPath code for your code would be:

//div[@class='a-artisanTease to-animate']

4
On

The actual reason is because you need a specific cookie for the server to serve you the full html for the page. Also your css selector expression is wrong.

The cookie needed is "tarteaucitron=!googletagmanager=wait" and the correct css expression would be div.a-artisanTease.to-animate

For example using scrapy shell:

In [1]: fetch(scrapy.Request("https://www.artisans-du-batiment.com/trouver-un-artisan-qualifie/?job=Charpentier&place=35000%2F35900", h
   ...: eaders = {"cookies": "tarteaucitron=!googletagmanager=wait"}))
2023-10-17 00:43:16 [scrapy.core.engine] INFO: Spider opened
2023-10-17 00:43:17 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.artisans-du-batiment.com/trouver-un-artisan-qualifie/?job=Charpentier&place=35000%2F35900> (referer: None)

In [2]: response.css("div.a-artisanTease.to-animate")
Out[2]:
[<Selector query="descendant-or-self::div[(@class and contains(concat(' ', normalize-space(@class), ' '), ' a-artisanTease ')) and (@class and contains(concat(' ', normalize-space(@class), ' '), ' to-animate '))]" data='<div class="a-artisanTease to-animate...'>,
 <Selector query="descendant-or-self::div[(@class and contains(concat(' ', normalize-space(@class), ' '), ' a-artisanTease ')) and (@class and contains(concat(' ', normalize-space(@class), ' '), ' to-animate '))]" data='<div class="a-artisanTease to-animate...'>,
 <Selector query="descendant-or-self::div[(@class and contains(concat(' ', normalize-space(@class), ' '), ' a-artisanTease ')) and (@class and contains(concat(' ', normalize-space(@class), ' '), ' to-animate '))]" data='<div class="a-artisanTease to-animate...'>,
 <Selector query="descendant-or-self::div[(@class and contains(concat(' ', normalize-space(@class), ' '), ' a-artisanTease ')) and (@class and contains(concat(' ', normalize-space(@class), ' '), ' to-animate '))]" data='<div class="a-artisanTease to-animate...'>,
 <Selector query="descendant-or-self::div[(@class and contains(concat(' ', normalize-space(@class), ' '), ' a-artisanTease ')) and (@class and contains(concat(' ', normalize-space(@class), ' '), ' to-animate '))]" data='<div class="a-artisanTease to-animate...'>,
 <Selector query="descendant-or-self::div[(@class and contains(concat(' ', normalize-space(@class), ' '), ' a-artisanTease ')) and (@class and contains(concat(' ', normalize-space(@class), ' '), ' to-animate '))]" data='<div class="a-artisanTease to-animate...'>,
 <Selector query="descendant-or-self::div[(@class and contains(concat(' ', normalize-space(@class), ' '), ' a-artisanTease ')) and (@class and contains(concat(' ', normalize-space(@class), ' '), ' to-animate '))]" data='<div class="a-artisanTease to-animate...'>,
 <Selector query="descendant-or-self::div[(@class and contains(concat(' ', normalize-space(@class), ' '), ' a-artisanTease ')) and (@class and contains(concat(' ', normalize-space(@class), ' '), ' to-animate '))]" data='<div class="a-artisanTease to-animate...'>,
 <Selector query="descendant-or-self::div[(@class and contains(concat(' ', normalize-space(@class), ' '), ' a-artisanTease ')) and (@class and contains(concat(' ', normalize-space(@class), ' '), ' to-animate '))]" data='<div class="a-artisanTease to-animate...'>,
 <Selector query="descendant-or-self::div[(@class and contains(concat(' ', normalize-space(@class), ' '), ' a-artisanTease ')) and (@class and contains(concat(' ', normalize-space(@class), ' '), ' to-animate '))]" data='<div class="a-artisanTease to-animate...'>,
 <Selector query="descendant-or-self::div[(@class and contains(concat(' ', normalize-space(@class), ' '), ' a-artisanTease ')) and (@class and contains(concat(' ', normalize-space(@class), ' '), ' to-animate '))]" data='<div class="a-artisanTease to-animate...'>,
 <Selector query="descendant-or-self::div[(@class and contains(concat(' ', normalize-space(@class), ' '), ' a-artisanTease ')) and (@class and contains(concat(' ', normalize-space(@class), ' '), ' to-animate '))]" data='<div class="a-artisanTease to-animate...'>,
 <Selector query="descendant-or-self::div[(@class and contains(concat(' ', normalize-space(@class), ' '), ' a-artisanTease ')) and (@class and contains(concat(' ', normalize-space(@class), ' '), ' to-animate '))]" data='<div class="a-artisanTease to-animate...'>,
 <Selector query="descendant-or-self::div[(@class and contains(concat(' ', normalize-space(@class), ' '), ' a-artisanTease ')) and (@class and contains(concat(' ', normalize-space(@class), ' '), ' to-animate '))]" data='<div class="a-artisanTease to-animate...'>,
 <Selector query="descendant-or-self::div[(@class and contains(concat(' ', normalize-space(@class), ' '), ' a-artisanTease ')) and (@class and contains(concat(' ', normalize-space(@class), ' '), ' to-animate '))]" data='<div class="a-artisanTease to-animate...'>,
 <Selector query="descendant-or-self::div[(@class and contains(concat(' ', normalize-space(@class), ' '), ' a-artisanTease ')) and (@class and contains(concat(' ', normalize-space(@class), ' '), ' to-animate '))]" data='<div class="a-artisanTease to-animate...'>,
 <Selector query="descendant-or-self::div[(@class and contains(concat(' ', normalize-space(@class), ' '), ' a-artisanTease ')) and (@class and contains(concat(' ', normalize-space(@class), ' '), ' to-animate '))]" data='<div class="a-artisanTease to-animate...'>,
 <Selector query="descendant-or-self::div[(@class and contains(concat(' ', normalize-space(@class), ' '), ' a-artisanTease ')) and (@class and contains(concat(' ', normalize-space(@class), ' '), ' to-animate '))]" data='<div class="a-artisanTease to-animate...'>,
 <Selector query="descendant-or-self::div[(@class and contains(concat(' ', normalize-space(@class), ' '), ' a-artisanTease ')) and (@class and contains(concat(' ', normalize-space(@class), ' '), ' to-animate '))]" data='<div class="a-artisanTease to-animate...'>,
 <Selector query="descendant-or-self::div[(@class and contains(concat(' ', normalize-space(@class), ' '), ' a-artisanTease ')) and (@class and contains(concat(' ', normalize-space(@class), ' '), ' to-animate '))]" data='<div class="a-artisanTease to-animate...'>,
 <Selector query="descendant-or-self::div[(@class and contains(concat(' ', normalize-space(@class), ' '), ' a-artisanTease ')) and (@class and contains(concat(' ', normalize-space(@class), ' '), ' to-animate '))]" data='<div class="a-artisanTease to-animate...'>,
 <Selector query="descendant-or-self::div[(@class and contains(concat(' ', normalize-space(@class), ' '), ' a-artisanTease ')) and (@class and contains(concat(' ', normalize-space(@class), ' '), ' to-animate '))]" data='<div class="a-artisanTease to-animate...'>,
 <Selector query="descendant-or-self::div[(@class and contains(concat(' ', normalize-space(@class), ' '), ' a-artisanTease ')) and (@class and contains(concat(' ', normalize-space(@class), ' '), ' to-animate '))]" data='<div class="a-artisanTease to-animate...'>,
 <Selector query="descendant-or-self::div[(@class and contains(concat(' ', normalize-space(@class), ' '), ' a-artisanTease ')) and (@class and contains(concat(' ', normalize-space(@class), ' '), ' to-animate '))]" data='<div class="a-artisanTease to-animate...'>]
fetch(scrapy.Request("https://www.artisans-du-batiment.com/trouver-un-artisan-qualifie/?job=Charpentier&place=35000%2F35900", headers = {"cookies": "tarteaucitron=!googletagmanager=wait"}))