CAM::PDF Error "Expected String Closing" when editing PDF file

64 Views Asked by At

Im using CAM::PDF with Perl to delete/replace some text in pdf files that are only 1 page.

my $repl_str = "redacted"

my $pdf = CAM::PDF->new($file_name) or die("Couldn't read PDF $file_name: $CAM::PDF::errstr");
my $content = $pdf->getPageContent(1);
my $text = $pdf->getPageText(1);

my @del_lines;

my @lines = split (/\n/, $text); # Splits lines into array @lines

foreach my $l (@lines) {

  if ($l =~ /sometexttotemove/ || $l =~ /othertexttoremove/) {
    push @del_lines, $l;
  }

for (@del_lines) {
  s/([\(\)])/\\$1/g; # in PDF, parens are pre-escaped so they need an extra backslash
   my $m = quotemeta;
   $content =~ s/$m/$repl_str/;
 }

$pdf->setPageContent(1, $content);
$pdf->cleanoutput($outfile) or die("Couldn't write ${mrn}_${dt}.pdf: $CAM::PDF::errstr");

}

This works 99.9% of the time, but very rarely I get this error and the script terminates:

Expected string closing
250  >>...

If I look at the $content variable after its extracted and before any manipulation I get this:

BT 450 20430 Td (sometexttotemove) Tj ET
BT 7750 20430 Td (othertexttoremove) Tj ET

and after replacing the strings in $content if get this:

BT 450 20430 Td () Tj ET
BT 7750 20430 Td () Tj ET

This is the same regardless of if the script crashes or not.

Can someone explain why this error is happening?

This initially seemed to happen if there were parentheses in the sometexttotemove, but even with trying to deal with escape characters that does not fix the problem.

Found the error in the CAM::PDF source code at https://github.com/gitpan/CAM-PDF/blob/master/lib/CAM/PDF.pm

but I still dont understand why the error is happening only on some PDFs

0

There are 0 best solutions below