I've been using the CAM::PDF module to try editing pdf docs at work - essentially just trying to change the date on docs automatically to show they have been reviewed recently
unfortunately, despite my code telling me that I am making changes to the PDF objects ($pdf->{changes}) and giving the pdfs the doc is attempting to change maximum accessibility (anyone can access, read, write) the pdf's outputted never seem to materialise with these changes. I have also been grepping the object node tmp files I output on mass and found that all of these show no sign of the old date after running the code; yet when I view the pdf after running it, the old date is still on the pdf. Has anyone encountered this before or can suggest anything?
just doing this manually isn't an option; I want to script this so I can have a script I just run against multiple files at once (I have LOTS of these files to sort out at work) but other than changing dates written on the doc, the doc has to remain looking the sameish (by which I mean, it would be ok if they changed in size a little but not ok if they completely changed in appearance)
I strictly followed the example changepdfstring.pl (https://metacpan.org/pod/distribution/CAM-PDF/bin/changepdfstring.pl) from the author of the module CAM::PDF on how to do this for my code, then tried different variations of it to try and get things to work - so I'm bemused that nothing has worked in the end
#!/usr/bin/perl
use strict;
use warnings;
use CAM::PDF;
use Data::Dumper;
my $pdf = CAM::PDF->new('Order fulfilment process flowchart.pdf');
if (!$pdf->canModify())
{
die "This PDF forbids modification\n";
}
my $olddate = "15.02.2019";
my $newdate = "22.02.2022";
foreach my $objectnumber (keys %{$pdf->{xref}}){
my $objectnode = $pdf->dereference($objectnumber);
$pdf->changeString($objectnode, {$olddate=>$newdate});
}
my $change = $pdf->{changes};
print Dumper($change);
my $count = 0;
foreach my $objectnumber (keys %{$pdf->{xref}}){
my $objectnode = $pdf->dereference($objectnumber);
$count++;
open (ONO, ">tmp.objectnode.$count");
print ONO Dumper($objectnode);
close (ONO);}
if (!scalar %{$pdf->{changes}})
{
die "no changes were made :(";
}
$pdf->preserveOrder();
$pdf->cleanoutput('pleasework.pdf');
Any help or advice would be greatly appreciated
I found that the line I was trying to edit was not actually a contiguous set of characters in the pdf, but rather it was inside a TJ operator in a BT line in the PDF. I cannot see any provision for handling cases where the desired text is in TJ lines in the CAM::PDF library (although perhaps there is @ChrisDolan ?) hence it was unable to be operated on nor "swapped out" by CAM::PDF. After decompressing all the streams (where applicable) I found this 'TJ' line which had the text I wished to operate on:
I don't believe it would have been possible for CAM::PDF to act on TJ lines, perhaps it can only act on Tj lines
For anyone looking for a quick answer to this same problem, this "dirty" script worked for me in this case:
Essentially I swap out the TJ for a Tj for changing someone elses name on the document to my name, which makes it simpler to insert my change (but potentially messy). To enable this to display with capitalised letters, I had to reverse the string and swap out the font (F) it was under (F2) to F0
For the TJ line relating to date, I swapped out the TJ characters for the date I wished to change it to, this meant I had to abide by the "unfriendly" syntax TJ operator lines abide by