Extract text from specific HTML location across multiple pages

Question

Extract text from specific HTML location across multiple pages

316 Views Asked by timoto At 20 June 2025 at 21:56

I have been experimenting with Jericho HTML Parser and Selenium IDE for the purpose of extracting text from a specific location inside HTML across multiple pages.

I have not found a simple example of how to do this and I don't know java.

I would like to find in a folder all HTML pages in the 1st table, 4th row, 1st div any string of text:

</table>
 <tr class="abc"><td class="xyz"><div align="center">The Text I don't want</div></td></tr>
 <tr class="abc"><td class="xyz"><div align="center">The Text I don't want</div></td></tr>
 <tr class="abc"><td class="xyz"><div align="center">The Text I don't want</div></td></tr>    
 <tr class="abc"><td class="xyz"><div align="center">The Text I want</div></td></tr>
</table>

And print the selected text to a txt file in a list like this:

    The Text I want
    Another Text I want

All the source files are stored locally and may contain bad HTML, so figured Jericho might be best for this purpose. However I'm happy to learn any method to achieve the desired result.

Original Q&A

There are 1 best solutions below

**timoto** · Answer 1

Well in the end I went with beautifulsoup and used a python script with something like this:

# open source html file
with open(html_pathname, 'r') as html_file:
# using BeautifulSoup module search html tag's tree
soup = BeautifulSoup(html_file)
# find according your criteria "1st table, 6th tr, 1st td, 1st div"
trs = soup.html.body.table.tr.findNextSiblings('tr')[4].td.div
# write found text to result txt
print ' - writing to result txt'
result_file.write(''.join(trs.contents) + '\n')
print ' - ok!'

Extract text from specific HTML location across multiple pages

There are 1 best solutions below

Related Questions in HTML-PARSING

Related Questions in TEXT-EXTRACTION

Related Questions in JERICHO-HTML-PARSER

Trending Questions

Popular # Hahtags

Popular Questions