Perl, simplest way to override a class that's called by another class? (In MIME::Parser)

72 Views Asked by At

I'm using MIME::Parser to parse a bunch of emails. I want a different subdirectory for every mail message, with consistent file naming within each subdirectory. So far so good, but there's a couple of inadequacies in MIME::Parser. One is that insists on putting a process ID into the filename of mime parts that aren't otherwise named. It's ugly but I can just rename these files after the fact, though I'd prefer to override somehow. The worse part though is that it numbers all parts that don't have names (which is fine), but these numbers never reset.

dir-for-email1
  part-65432-1.txt
  part-65432-2.txt
dir-for-email2
  part-65432-3.txt
  part-65432-4.txt

I want each email subdirectory to have it's own numbering starting at 1. I could also fix this by renaming but at that point the code is getting non-trivial and there must be a better way?

Internally, MIME::Parser creates an instance of MIME::Parser::Filer, which uses a "my $GFIleNo" variable to track file numbers. New instances of MIME::Parser::Filer are created when you either create a new instance of MIME::Parser (which I am doing with every new email), as well as when you call output_under() (which I have also tried). So I'm surprised that creating a new instance is not resetting GFileNO (though I suspect this is the lexical scoping you get form "my" declaration that is used, although I don't have a deep guru understanding of perl's scoping rules). I even tried storing my MIME::Parser instance in a local variable which I believe should definitely be destroyed with each new call to a subroutine? Didn't help.

I don't know much about subclasses and overrides in perl. But if I did try to do this it seems I'd have to override both MIME::Parser::Filer (the output_filename subroutine), but also the MIME::Parser package, to get it to use my new subclass, since I don't directly create MIME::Parser::Filer instances.

I did try to create a subclass of MIME::Parser::Filer, and add a new subroutine that resets GFileNo, and then called that from inside my loop for each file, but again I suspect that the subclass GFileNo is a different instance than the original GFileNo. At any rate, I don't exactly know what I'm doing and it didn't work.

package MyFiler;

use parent MIME::Parser::Filer;

sub reset_numbering {
  $GFileNo=0;
}

1;

What is the right way to do this? And is there an evil, incorrect way of reaching inside of Mime::Parser::Filer and just changing GFileNo?

Is there an easy fix I'm not seeing?

EDIT: adding code as requested

sub parsemail {
  my $file=shift(@_);

  my $base=$file;
  $base =~ s|^.*/||;
  $base =~ s|.emlx$||;
  $outdir="$ROOTDIR/$base";
  mkdir("$outdir") || die("$!");

  my $parser = new MIME::Parser;

  $parser->output_dir("$outdir");
  $parser->output_prefix("part");

  my $raw="";
  open(IN,$file);
  <IN>; #Eat the length line Apple adds at top of emlx
  while(<IN>) { $raw .= $_; }
  close(IN);
  my $mimeparsed=$parser->parse_data($raw);
}
2

There are 2 best solutions below

4
brian d foy On

The docs on MIME::Parser aren't great, but you don't have to override the main module. The filer method sets the object to use:

my $parser = MIME::Parser->new( ... );
my $my_filer = MyFiler->new(...);

$parser->filer($my_filer);

Now, override output_filename and somehow track the sequence number to the directory name. Each directory name could maintain its own counter in whatever way you want to do that. You'd completely ignore how the base Filer is doing that (and really, it shouldn't be using a variable like that). However, note that it looks like this number never resets because that would allow for a collision in names. You'd have to figure out how you'd want to handle that when you are going to allow for the code to produce the same base filename (not path) more than once.

0
thomasafine On

I went back to trying to override the output_filename function, and this time it worked. Not sure where I went wrong before.

Basically, I copied the original output_filename, stripped away the part that added the process id, and made the GFileNo variable fully global, and then just externally reset it to zero before each call to parse an email. Cheesy but adequate.

Here's the code for the altered output_filename:

$GlobalFileNo=0;

local *MIME::Parser::Filer::output_filename = sub {
    my ($self, $head) = @_;

    ### Get the recommended name:
    my $recommended = $head->recommended_filename;

    ### Get content type:
    my ($type, $subtype) = split m{/}, $head->mime_type; $subtype ||= '';

    ### Get recommended extension, being quite conservative:
    my $recommended_ext = (($recommended and ($recommended =~ m{(\.\w+)\Z}))
                           ? $1
                           : undef);

    ### Try and get an extension, honoring a given one first:
    my $ext = ($recommended_ext ||
               $self->{MPF_Ext}{"$type/$subtype"} ||
               $self->{MPF_Ext}{"$type/*"} ||
               $self->{MPF_Ext}{"*/*"} ||
               ".dat");

    ### Get a prefix:
    ++$GlobalFileNo;
    return ("part-$GlobalFileNo$ext");
};