I'm writing a Google app engine project in python. I need to scrap the banks sites, get the exchange rate from them.
the example of html:
<tr>
<td width="2"><img src="./images/zero.gif" width="2" height="2" border="0" /></td>
<td width="41" class="curvalsh" align="left" valign="middle"><font color="#DC241F">USD</font></td>
<td width="41" class="curvalsh" align="right" valign="middle"><b> 15.20 </b></td>
<td width="4" align="left" valign="middle"><img src="./images/zero.gif" width="2" height="20" border="0" hspace="1"></td>
<td width="41" class="curvalsh" align="right" valign="middle"><b> 16.00 </b></td>
<td width="4" align="left" valign="middle"><img src="./images/zero.gif" width="2" height="20" border="0" hspace="1"></td>
<td width="41" class="curvalsh" align="right" valign="middle"> - </td>
<td width="2" align="left" valign="middle"><img src="./images/zero.gif" width="2" height="20" border="0" hspace="1"></td>
</tr>
I need to get the next two tags with text after tag containing "USD" text(tags with 15.20 and 16.00).
What i've already done is:
xpath = "//tr/td[text()='USD']/following-sibling::td/text()"
But this doesn't return anything and this is not exactly what i need, because i have to specify to get 2 tags containing text after tag "USD", because there are also tags which don't contain any text.
EDIT:
I've also tried like this, still returns nothing
xpath = "//tr/td[text()='USD']/following-sibling::td[matches(text(),'(^|\W)[0-9]+.[0-9]+($|\W)','i')]/text()"
notice that there is another tag inside
td
before getting to the searched text, so you can either search directly:or
in any case you will then go one level up using
..
much like when browsing the file system.well, and there is another tag hiding there that you can refer directly using
b/text()
or take all text under next sibling by//text()
this is how it might looks: