I'm using MIME::Parser to parse a bunch of emails. I want a different subdirectory for every mail message, with consistent file naming within each subdirectory. So far so good, but there's a couple of inadequacies in MIME::Parser. One is that insists on putting a process ID into the filename of mime parts that aren't otherwise named. It's ugly but I can just rename these files after the fact, though I'd prefer to override somehow. The worse part though is that it numbers all parts that don't have names (which is fine), but these numbers never reset.
dir-for-email1
part-65432-1.txt
part-65432-2.txt
dir-for-email2
part-65432-3.txt
part-65432-4.txt
I want each email subdirectory to have it's own numbering starting at 1. I could also fix this by renaming but at that point the code is getting non-trivial and there must be a better way?
Internally, MIME::Parser creates an instance of MIME::Parser::Filer, which uses a "my $GFIleNo" variable to track file numbers. New instances of MIME::Parser::Filer are created when you either create a new instance of MIME::Parser (which I am doing with every new email), as well as when you call output_under() (which I have also tried). So I'm surprised that creating a new instance is not resetting GFileNO (though I suspect this is the lexical scoping you get form "my" declaration that is used, although I don't have a deep guru understanding of perl's scoping rules). I even tried storing my MIME::Parser instance in a local variable which I believe should definitely be destroyed with each new call to a subroutine? Didn't help.
I don't know much about subclasses and overrides in perl. But if I did try to do this it seems I'd have to override both MIME::Parser::Filer (the output_filename subroutine), but also the MIME::Parser package, to get it to use my new subclass, since I don't directly create MIME::Parser::Filer instances.
I did try to create a subclass of MIME::Parser::Filer, and add a new subroutine that resets GFileNo, and then called that from inside my loop for each file, but again I suspect that the subclass GFileNo is a different instance than the original GFileNo. At any rate, I don't exactly know what I'm doing and it didn't work.
package MyFiler;
use parent MIME::Parser::Filer;
sub reset_numbering {
$GFileNo=0;
}
1;
What is the right way to do this? And is there an evil, incorrect way of reaching inside of Mime::Parser::Filer and just changing GFileNo?
Is there an easy fix I'm not seeing?
EDIT: adding code as requested
sub parsemail {
my $file=shift(@_);
my $base=$file;
$base =~ s|^.*/||;
$base =~ s|.emlx$||;
$outdir="$ROOTDIR/$base";
mkdir("$outdir") || die("$!");
my $parser = new MIME::Parser;
$parser->output_dir("$outdir");
$parser->output_prefix("part");
my $raw="";
open(IN,$file);
<IN>; #Eat the length line Apple adds at top of emlx
while(<IN>) { $raw .= $_; }
close(IN);
my $mimeparsed=$parser->parse_data($raw);
}
The docs on MIME::Parser aren't great, but you don't have to override the main module. The
filermethod sets the object to use:Now, override
output_filenameand somehow track the sequence number to the directory name. Each directory name could maintain its own counter in whatever way you want to do that. You'd completely ignore how the base Filer is doing that (and really, it shouldn't be using a variable like that). However, note that it looks like this number never resets because that would allow for a collision in names. You'd have to figure out how you'd want to handle that when you are going to allow for the code to produce the same base filename (not path) more than once.