<div class = "card-block cms>
<p>and then have a tea or coffee on the balcony of the cafeteria.</p>
<p> </p>
</div>
I am trying to check if the text I crawl of a website contains
texts = driver.find_element_by_xpath("//div[@class='card-block cms']")
textInDivTag = texts.text
print(textInDivTag)
if u"\xa0" in textInDivTag:
print("yes")
My output is as follows:
and then have a tea or coffee on the balcony of the cafeteria.
As you can see, it doesn't recognize the non-breaking space.
The character is recognized, but it is being converted to a normal space (
u"\x20"
).According to the comment in the Java Selenium sourcecode,
.text
/.getText()
returns the visible text, and references the W3C webdriver specification, section "11.3.5 Get Element Text" (emphasis added by me):So probably, this behavior is according to the specification, but I couldn't yet find the source code specifically replacing non-breaking spaces by regular whitespace. I could also not find an issue in the Selenium repository, but maybe you can give it a try by opening one.