Establishment year 2014 The tag I want to deal" /> Establishment year 2014 The tag I want to deal" /> Establishment year 2014 The tag I want to deal"/>

How to get non element text adjacent to a tag using Scrapy?

577 Views Asked by At

I am trying to scrape a page using Scrapy Framework.

<div class="info"><span class="label">Establishment year</span> 2014</div>

The tag I want to deal with looks like above. I want to get the value 2014. I can't use info or label class as they are common through the page.

So, I tried below xpath but I am getting null:

response.xpath("//span[contains(text(),'Establishment year')]/following-sibling").get()

response.xpath("//span[contains(text(),'Establishment year')]/following-sibling::text()").get()

Any clue what can be the issue?

2

There are 2 best solutions below

2
dram95 On

Since you are trying to extract it in between the tag you should use the tag at the end. I don't know what website you are trying to scrape but here is an example of me scraping in between the 'a' tag on this website http://books.toscrape.com/ Here is the code I used for it

response.xpath("(//h3)[1]/a/text()").extract_first()

In your second line of code you did not use the function for extracting text right. The one you are using is for CSS selector. For Xpath if would be /text(), not ::text(). For you code I think you should try one of these options. Let me know if it helps.

response.xpath("//span[contains(text(),'Establishment year')]/div/text()").get()

or

response.xpath("//span[contains(text(),'Establishment year')]/span/text()").get()
1
Gallaecio On

Extract direct text children (/text()) from the parent element:

>>> from parsel import Selector
>>> selector = Selector(text='<div class="info"><span class="label">Establishment year</span> 2014</div>')
>>> selector.xpath('//*[@class="info"]/text()').get()
' 2014'