Trouble using Xpath "starts with" to parse xhtml

10.6k Views Asked by At

I'm trying to parse a webpage to get posts from a forum.
The start of each message starts with the following format

<div id="post_message_somenumber">

and I only want to get the first one

I tried xpath='//div[starts-with(@id, '"post_message_')]' in yql without success
I'm still learning this, anyone have suggestions

3

There are 3 best solutions below

0
On

I tried xpath='//div[starts-with(@id, '"post_message_')]' in yql without success I'm still learning this, anyone have suggestions

If the problem isn't due to the many nested apostrophes and the unclosed double-quote, then the most likely cause (we can only guess without being shown the XML document) is that a default namespace is used.

Specifying names of elements that are in a default namespace is the most FAQ in XPath. If you search for "XPath default namespace" in SO or on the internet, you'll find many sources with the correct solution.

Generally, a special method must be called that binds a prefix (say "x:") to the default namespace. Then, in the XPath expression every element name "someName" must be replaced by "x:someName.

Here is a good answer how to do this in C#.

Read the documentation of your language/xpath-engine how something similar should be done in your specific environment.

0
On

I think I have a solution that does not require dealing with namespaces.

Here is one that selects all matching div's:

//div[@id[starts-with(.,"post_message")]]

But you said you wanted just the "first one" (I assume you mean the first "hit" in the whole page?). Here is a slight modification that selects just the first matching result:

(//div[@id[starts-with(.,"post_message")]])[1]

These use the dot to represent the id's value within the starts-with() function. You may have to escape special characters in your language.

It works great for me in PowerShell:

# Load a sample xml document
$xml = [xml]'<root><div id="post_message_somenumber"/><div id="not_post_message"/><div id="post_message_somenumber2"/></root>'

# Run the xpath selection of all matching div's
$xml.selectnodes('//div[@id[starts-with(.,"post_message")]]')

Result:

id
--
post_message_somenumber
post_message_somenumber2

Or, for just the first match:

# Run the xpath selection of the first matching div
$xml.selectnodes('(//div[@id[starts-with(.,"post_message")]])[1]')

Result:

id
--
post_message_somenumber
0
On
@FindBy(xpath = "//div[starts-with(@id,'expiredUserDetails') and contains(text(), 'Details')]") 
private WebElementFacade ListOfExpiredUsersDetails;

This one gives a list of all elements on the page that share an ID of expiredUserDetails and also contains the text or the element Details