I am working on identifying forgery/tampering in bank statements PDF documents. Info metadata and XMP metadata is not always present in the PDFs that I have so I am not able to create any generalized rule to identify tampered PDFs. I am using Python libraries such as PyMuPDF, PDFMiner, PyPDF2 etc.
I have 2 questions:
- Is there any concrete way to identify whether the PDF is tampered (using Python or any other opensource technology) ?
- If the PDF is tampered then which part of the PDF has been tampered (using Python or any other opensource technology)?
Attaching 2 PDFs for reference -
original :- "sbi statment_out2.pdf" link - https://drive.google.com/file/d/1DoWAKYcCudRO-Cwjbgf7RjiJUsF3DD3s/view?usp=sharing
Tampered using Sejda online editor :- "sbi statment_out2_Sejda_edited.pdf link - https://drive.google.com/file/d/1J4eRy9tO3jN8AqEWNrKXtn40G6vdH5G3/view?usp=sharing
In tempered PDF, I have edited '2,412.00' under 'Credit' column to '12.00'.
Kindly let me know in case any open source solution, preferably in Python.
Thanks.
The canonical way to ensure that a PDF is not tampered with is by only accepting PDFs with digital signatures by the originator and validating them as Frank has already pointed out with a link to an Adobe forum.
Variations thereof could be
Such cryptographic methods are reasonably secure if implemented correctly.
Unfortunately these secure methods require that the producer of the PDF cooperates accordingly when publishing the PDFs.
If the producer does not cooperate and simply publishes PDFs without such a cryptographic protection, you can still compare internal details of PDFs which should be created similarly. If such internal details differ considerably, either someone amateurishly tampered with the PDF or the PDF producer updated or switched the PDF production software.
In case of your example files there are numerous differences in such details, e.g.
Surely you can use Python PDF libraries to check for such details and determine divergences.
But beware, this way you will only catch dilettante forgers. Forgers who know their business will leave hardly any such traces in their outputs...