I am trying to run a perl script with a few twigs being constructed in it. This script should take xml files and give back the version numbers which are present as an attribute in the files. Every time I try to parse a large file(23 MB) , the script crashes with the following -
"Child 341 terminated with signal 11".
Code to invoke subroutines which will get the required attribute-
my $version = $strm_obj->get_attr(file=>$file1,tag=>"config",attr=>"contentversion");
print "Version of $file1 is $version \n";
my $globalversion = $strm_obj->get_attr(file=>$file2,tag=>"config",attr=>"globalcontentversion");
print "Version of $file2 is $globalversion \n";
Subroutines to get the required attribute -
sub get_attr{
my ($self,%args) = @_;
my $file = $args{file};
my $tag = $args{tag};
my $attr = $args{attr};
my $val;
$self->{_ATTR} = $attr;
$self->{_TAG} = $tag;
test_log(DEBUG,"Value of tag is $tag, attribute is $attr");
my $twig= XML::Twig->new(
twig_roots => { $tag
=> sub {$self->get_attr_helper(@_,$tag,\$val); } } )
->parsefile($file);
if ($val){
test_log(INFO,"value of attribute $attr is $val");
}
if (!$val){
test_log(INFO,"The attribute $attr that you are looking for, is not present in $file");
return -1;
}
$twig->purge;
$twig->dispose;
return $val;
}
sub get_attr_helper{
my($self,$obj,$tag,$act_tag,$val) = @_;
my $attr = $self->{_ATTR};
#print "my attr is $attr\n";
for my $node ($tag->findnodes("//$self->{_TAG}")){
if ($node->att("$attr")){
$$val = $node->att("$attr");
}
}
$obj->purge;
}
The xml files are of the following format:
$file1 -
<config contentversion="378">
<tag1>
.
.
.
<tag n>
</config>
$file2 -
<config globalcontentversion="378">
<tag1>
.
.
.
<tag n>
</config>
I can't really provide the actual xml files here.
I know that this script consumes about 20% memory of my machine at the most (2GB RAM).
I have looked around and have been unable to find a solution to this.
How can I eliminate the seg faults?
It's hard to give a specific answer, as segmentation fault means something's breaking messily (it's a memory-based issue).
XML is quite prone to a big memory footprint, and in no small part one of the biggest advantages of XML::Twig is it's ability to parse-and-discard using
twig_handlers
andpurge
.This makes it perfect for partial extraction of things from XML.
I can't see specifically what's giving you a segfault, but then - in perl, you don't often get segfaults it's likely something external.
That aside though - you seem to be doing something quite complicated to extract a version number from your files. (This is assuming I've not misread what you're trying to extract).
Would not something like this suit your needs?:
Although if your 'doc root' is always that 'config' branch that you're trying to extract, you can simplify further:
I've tried this - it works for both samples you've given thus far. If you're still segfaulting though, I'd be thinking in terms of checking what you have installed.
(It might be worth doing a 'twig handlers' approach to trap the tag, but I don't see that's particularly necessary as the big advantage would be purging as you go, which doesn't appear to be necessary given the size of the problem).
There is a bug listed in XML::Twig:
http://search.cpan.org/~mirod/XML-Twig-3.48/Twig.pm#BUGS
I'm not really sure this applies to you though, because I wouldn't really call '23MB' a huge XML. (Even bearing in mind that the memory footprint of XML is about 10x).