Perl script with XML Twig seg faulting, child terminated with signal 11

414 Views Asked by At

I am trying to run a perl script with a few twigs being constructed in it. This script should take xml files and give back the version numbers which are present as an attribute in the files. Every time I try to parse a large file(23 MB) , the script crashes with the following -

"Child 341 terminated with signal 11".

Code to invoke subroutines which will get the required attribute-

my $version = $strm_obj->get_attr(file=>$file1,tag=>"config",attr=>"contentversion");
print "Version of $file1 is $version \n";
my $globalversion = $strm_obj->get_attr(file=>$file2,tag=>"config",attr=>"globalcontentversion");
print "Version of $file2 is $globalversion \n";

Subroutines to get the required attribute -

sub get_attr{
my ($self,%args) = @_;
my $file = $args{file};
my $tag = $args{tag};
my $attr = $args{attr};
my $val;
$self->{_ATTR} = $attr;
$self->{_TAG} = $tag;
test_log(DEBUG,"Value of tag is $tag, attribute is $attr");
my $twig= XML::Twig->new(
        twig_roots => { $tag
                        => sub {$self->get_attr_helper(@_,$tag,\$val); } } )
                       ->parsefile($file);
if ($val){
    test_log(INFO,"value of attribute $attr is $val");
}
if (!$val){
    test_log(INFO,"The attribute $attr that you are looking for, is not present in $file");
    return -1;
}
$twig->purge;
$twig->dispose;
return $val;
}

sub get_attr_helper{
my($self,$obj,$tag,$act_tag,$val) = @_;
my $attr = $self->{_ATTR};
#print "my attr is $attr\n";
for my $node ($tag->findnodes("//$self->{_TAG}")){
    if ($node->att("$attr")){
        $$val = $node->att("$attr");
    }
}
$obj->purge;
}

The xml files are of the following format:

$file1 -

<config contentversion="378">
  <tag1>
  .
  .
  .
  <tag n>
</config>

$file2 -

<config globalcontentversion="378">
  <tag1>
  .
  .
  .
  <tag n>
</config>

I can't really provide the actual xml files here.

I know that this script consumes about 20% memory of my machine at the most (2GB RAM).

I have looked around and have been unable to find a solution to this.

How can I eliminate the seg faults?

1

There are 1 best solutions below

0
On

It's hard to give a specific answer, as segmentation fault means something's breaking messily (it's a memory-based issue).

XML is quite prone to a big memory footprint, and in no small part one of the biggest advantages of XML::Twig is it's ability to parse-and-discard using twig_handlers and purge.

This makes it perfect for partial extraction of things from XML.

I can't see specifically what's giving you a segfault, but then - in perl, you don't often get segfaults it's likely something external.

That aside though - you seem to be doing something quite complicated to extract a version number from your files. (This is assuming I've not misread what you're trying to extract).

Would not something like this suit your needs?:

use strict;
use warnings;
use XML::Twig;

sub get_attr {
    my ( $self, %args ) = @_;
    my $file = $args{file};
    my $tag  = $args{tag};
    my $attr = $args{attr};

    my $twig = XML::Twig->new()->parsefile($file);

    my $val = $twig->root->first_child($tag)->att($attr);
    #maybe error check to see if 'first_child($tag)' is defined first?

    return $val;
}

Although if your 'doc root' is always that 'config' branch that you're trying to extract, you can simplify further:

my $val  = $twig->root->att($attr);

I've tried this - it works for both samples you've given thus far. If you're still segfaulting though, I'd be thinking in terms of checking what you have installed.

(It might be worth doing a 'twig handlers' approach to trap the tag, but I don't see that's particularly necessary as the big advantage would be purging as you go, which doesn't appear to be necessary given the size of the problem).

There is a bug listed in XML::Twig:

http://search.cpan.org/~mirod/XML-Twig-3.48/Twig.pm#BUGS

segfault during parsing This happens when parsing huge documents, or lots of small ones, with a version of Perl before 5.16.

This is due to a bug in the way weak references are handled in Perl itself.

The fix is either to upgrade to Perl 5.16 or later (perlbrew is a great tool to manage several installations of perl on the same machine).

An other, NOT RECOMMENDED, way of fixing the problem, is to switch off weak references by writing XML::Twig::_set_weakrefs( 0); at the top of the code. This is totally unsupported, and may lead to other problems though,

I'm not really sure this applies to you though, because I wouldn't really call '23MB' a huge XML. (Even bearing in mind that the memory footprint of XML is about 10x).