The script below is able to remove all images from a PDF file using CAM::PDF
. The output, however, is corrupt. PDF readers are nonetheless able to open it, but they complain about errors. For instance, mupdf
says:
error: no XObject subtype specified
error: cannot draw xobject/image
warning: Ignoring errors during rendering
mupdf: warning: Errors found on page
Now, CAM::PDF
page at CPAN (here) lists the deleteObject()
method under "Deeper utilities", presumably meaning that it's not intended for public usage. Moreover, it warns that:
This function does NOT take care of dependencies on this object.
My question is: what is the right way to remove objects from a PDF file using CAM::PDF
? If the issue has to do with dependencies, how can I remove an object while taking care of its dependencies?
For how to remove images from a PDF using other tools, see a related question here.
use CAM::PDF;
my $pdf = new CAM::PDF ( shift ) or die $CAM::PDF::errstr;
foreach my $objnum ( sort { $a <=> $b } keys %{ $pdf->{xref} } ) {
my $xobj = $pdf->dereference ( $objnum );
if ( $xobj->{value}->{type} eq 'dictionary' ) {
my $im = $xobj->{value}->{value};
if
(
defined $im->{Type} and defined $im->{Subtype}
and $pdf->getValue ( $im->{Type} ) eq 'XObject'
and $pdf->getValue ( $im->{Subtype} ) eq 'Image'
)
{
$pdf->deleteObject ( $objnum );
}
}
}
$pdf->cleanoutput ( '-' );
This uses CAM::PDF, but takes a slightly different approach. Rather than attempting to delete the images, which is pretty hard, it replaces each image with a transparent image.
Firstly, note that we can use image magick to generate a blank PDF that contains nothing but a transparent image:
If we view the generated PDF in a text editor, we can find the main image object:
The important thing to note here is that we have generated a transparent image as object number 8.
It then becomes matter of importing this object, and using it to replace each of the real images in the PDF, effectively blanking them.
The script now replaces each image in the PDF with the imported transparent image object(object number 8 from
transparent.pdf
).