some text
more text
I'm " /> some text
more text
I'm " /> some text
more text
I'm "/>

Perl's HTML::Element - dumping just the descendants as HTML

701 Views Asked by At

I'm having trouble trying to output the contents of a matched node that I'm parsing:

<div class="description">some text <br/>more text<br/></div>

I'm using HTML::TreeBuilder::XPath to find the node (there's only one div with this class):

my $description = $tree->findnodes('//div[@class="description"]')->[0];

It finds the node (returned as a HTML::Element I believe) but $description->as_HTML includes the element itself too - I just want everything contained inside the element as HTML:

some text <br/>more text<br/>

I can obviously regex strip it out, but that feels messy and I'm sure I'm just missing a function somewhere to do it?

2

There are 2 best solutions below

1
Gilles Quénot On

Try doing this :

my $description = $tree->findnodes('//div[@class="description"]/text()')->[0];

This is a Xpath trick.

4
Jens Erat On

Use ./node() to fetch all subnodes including text and elements.

my $description = $tree->findnodes('//div[@class="description"]/node()');