I am proccessing a Large file about 1GB with XML::Twig using Twig_handlers where the XML file is devided to Entries ,where every Entry tag contain all its sub tags.
I want to develop some mechanism to check if every Entry is already proccessed on the past by saving its MD5 digest and then when try to run the code again to check if this entry is proccessed on the past and have the same digest to skip it ,currently i do this mechanism inside the Entry which not help a lot as the Twig entry is proccessed before I check the digest ,could some one suggest if its possible to check the digest of every entry before building the Twig ?
here synopsis of my code :
XML::Twig->new(
twig_handlers => {
'Entry' => sub {
if(not exists_digest($_->outer_xml)){
#do somthing}
},
}
)->parsefile('myfile.xml');
I'm not sure if this is an available option with XML::Twig (it could be, I just don't know) but you can do this on your own using Digest::MD5 and a hash. Use a hash to keep a record of what MD5 values you've already seen: