xpath: how to select items between item A and item B

639 Views Asked by At

I have an HTML page with this structure:

<big><b>Staff in:</b></big>
<br>
<a href='...'>Movie 1</a>
<br>
<a href='...'>Movie 2</a>
<br>
<a href='...'>Movie 3</a>
<br>
<br>
<big><b>Cast in:</b></big>
<br>
<a href='...'>Movie 4</a>

How do I select Movies 1, 2, and 3 using Xpath? I wrote this query

'//big/b[text()="Staff in:"]/following::a'

but it returns Movies 1, 2, 3, and 4. I guess I need to find a way to get items after <big><b>Staff in: but before the next <big>.

Thanks,

3

There are 3 best solutions below

1
On BEST ANSWER

Assuming that <big><b>Staff in:</b></big> is a unique element that we can use as 'anchor', you can try this way :

//big[b='Staff in:']/following-sibling::a[preceding-sibling::big[1][b='Staff in:']]

Basically, the xpath finds all <a> that is following sibling of the 'anchor' <big> element mentioned above, and restrict the result to those having nearest preceding sibling <big> equals the anchor element.

output in xpath tester given markup in question as input (with minimal adjustment to make it well-formed XML) :

Element='<a href="...">Movie 1</a>'
Element='<a href="...">Movie 2</a>'
Element='<a href="...">Movie 3</a>'
0
On

Just to add up and following the stackoverflow link here XPath axis, get all following nodes until here is the complete solution that i have worked up with xslt editor. Firstly /*/ is used instead of // as this is faster. Second the logic says all anchor nodes which are siblings of big are returned if they satisfy the inner condition that they have preceding sibling of big node equal to what they are following. Also presumed you have distinct big node.

The x-path looks like

/*/big[b="Cast in:"]/following-sibling::a [1 = count(preceding-sibling::big[1]| ../big[b="Cast in:"])]

The xslt solution looks like

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:template match="/">
        <html>
            <body>
            <h2>My Movie Collection</h2>
            <table border="1">
                <tr bgcolor="#9acd32">
                    <th>Title</th>

                </tr>
                <xsl:variable name="placeholder" select="/*/big" />
                <xsl:for-each select="$placeholder">
                    <xsl:variable name="i" select="position()" />
                    <b>
                        <xsl:value-of select="$i" />
                        <xsl:value-of select="$placeholder[$i]" />
                    </b>
                    <xsl:for-each
                        select="following-sibling::a [1 = count(preceding- 
sibling::big[1]| ../big[b=$placeholder[$i]])]">
                        <tr>
                            <td>
                                <xsl:value-of select="." />
                            </td>

                        </tr>
                    </xsl:for-each>
                </xsl:for-each>
            </table>
        </body>
    </html>
</xsl:template>
</xsl:stylesheet>
0
On

//a[preceding::b[text()="Staff in:"] and following::b[text()="Cast in:"]]

Returns all a after the element b with text Staff in: but before the element b with the text Cast in:.

You may need to add some more conditions to make it more specific depending on whether or not these b elements are unique on the page.