could someone explain me how to scrape content from <td> tags where the <th> has content value (actually in this case I need content of <b> tag for matching operation) "Row1 title", but without scraping <th> tag (or any of its content) in process? Here is my test HTML:
<table class="table_class">
<tbody>
<tr>
<th>
<b>
Row1 title
</b>
</th>
<td>2.660.784</td>
<td>2.944.552</td>
<td>Correct, has 3 td elements</td>
</tr>
<tr>
<th>
Row2 title
</th>
<td>2.660.784</td>
<td>2.944.552</td>
<td>Correct, has 3 td elements</td>
</tr>
</tbody>
</table>
Data which I want to extract should come from these tags:
<td>2.660.784</td>
<td>2.944.552</td>
<td>Correct, has 3 td elements</td>
I have managed to create function which returns entire content of the table, but I would like to exclude the <th> node from result, and to return only data from <td> nodes, which content I can use for further parsing. Can anyone help me with this?
With enlive something like this
should give you a sequence of all the
tdnodes, something of the form{:tag :td :attrs {...} :content (...)}. I am not aware that enlive gives you the possibility to get the content of those nodes directly. I could be wrong.You could then extract the content of the sequence for something along the lines of
(for [line ws-content] (apply str (:content line)))In regard to the question you posted yesterday (I am assuming you are still working with that page) - the solution I gave there was a little complex - but its also flexible. For example if you change the
tag-typefunction like this(change the return value of all nodes to
::IgnoreNodeexcept for:tdthen it just gives you a sequence of the content of the:tds which is probably close to what you want. Let me know if you need more help.EDIT (in reply to comments below) I don't think selecting nodes based on their
:contentis possible with enlive alone - but you can certainly do so with Clojure.for example you could do something like
could work. (you might have to tweak the
(:content line)form a little..