XPath. Getting specific siblings

164 Views Asked by At

I'm writing a Google app engine project in python. I need to scrap the banks sites, get the exchange rate from them.

the example of html:

<tr> 
                            <td width="2"><img src="./images/zero.gif" width="2" height="2" border="0" /></td>
                            <td width="41" class="curvalsh" align="left" valign="middle"><font color="#DC241F">USD</font></td>
                            <td width="41" class="curvalsh" align="right" valign="middle"><b> 15.20 </b></td>
                            <td width="4" align="left" valign="middle"><img src="./images/zero.gif" width="2" height="20" border="0" hspace="1"></td>
                            <td width="41" class="curvalsh" align="right" valign="middle"><b> 16.00 </b></td>
                            <td width="4" align="left" valign="middle"><img src="./images/zero.gif" width="2" height="20" border="0" hspace="1"></td>
                            <td width="41" class="curvalsh" align="right" valign="middle"> - </td>
                            <td width="2" align="left" valign="middle"><img src="./images/zero.gif" width="2" height="20" border="0" hspace="1"></td>
                        </tr>

I need to get the next two tags with text after tag containing "USD" text(tags with 15.20 and 16.00).

What i've already done is:

xpath = "//tr/td[text()='USD']/following-sibling::td/text()"

But this doesn't return anything and this is not exactly what i need, because i have to specify to get 2 tags containing text after tag "USD", because there are also tags which don't contain any text.

EDIT:

I've also tried like this, still returns nothing

xpath = "//tr/td[text()='USD']/following-sibling::td[matches(text(),'(^|\W)[0-9]+.[0-9]+($|\W)','i')]/text()"
1

There are 1 best solutions below

2
On BEST ANSWER

notice that there is another tag inside td before getting to the searched text, so you can either search directly:

//tr/td/font[text()='USD']......

or

//tr//font[text()="USD"]......

in any case you will then go one level up using .. much like when browsing the file system.

well, and there is another tag hiding there that you can refer directly using b/text() or take all text under next sibling by //text()

this is how it might looks:

//tr/td/font[text()='USD']/../following-sibling::td/b/text()