Scrapy Xpath Selector returning partial text

126 Views Asked by At

I have some text in the p tag which may contain the additional tags like em within that. When I pass the following text to parsel XPath selector and ask for the first p tag it returns me the partial string.

  from parsel import Selector

  selector = Selector(text="<div><p>Hel<em>l</em>o</p><p>World!</p></div>")

  for p in selector.xpath('(//div//p//extract())[1]'):
    print(p.get())

The output returned by the code is

Hel and the expected output is hello, what am I doing wrong here.

1

There are 1 best solutions below

0
kerasbaz On

I believe you're looking for something like this:

from parsel import Selector

selector = Selector(text="<div><p>Hel<em>l</em>o</p><p>World!</p></div>")

for p in selector.xpath('//div/p[1]/descendant-or-self::*/text()'):
  print(p.get())

# OR
print("".join([x.get() for x in selector.xpath('//div/p[1]/descendant-or-self::*/text()')]))

Depending on what you're trying to accomplish you may will want to avoid the double slashes in your xpath. See Working with relative XPaths