I am working on identifying forgery/tampering in bank statements PDF documents. Info metadata and XMP metadata is not always present in the PDFs that I have so I am not able to create any generalized rule to identify tampered PDFs. I am using Python libraries such as PyMuPDF, PDFMiner, PyPDF2 etc.
I have 2 questions:
- Is there any concrete way to identify whether the PDF is tampered (using Python or any other opensource technology) ?
- If the PDF is tampered then which part of the PDF has been tampered (using Python or any other opensource technology)?
Attaching 2 PDFs for reference -
original :- "sbi statment_out2.pdf" link - https://drive.google.com/file/d/1DoWAKYcCudRO-Cwjbgf7RjiJUsF3DD3s/view?usp=sharing
Tampered using Sejda online editor :- "sbi statment_out2_Sejda_edited.pdf link - https://drive.google.com/file/d/1J4eRy9tO3jN8AqEWNrKXtn40G6vdH5G3/view?usp=sharing
In tempered PDF, I have edited '2,412.00' under 'Credit' column to '12.00'.
Kindly let me know in case any open source solution, preferably in Python.
Thanks.
Adobe says that there is no way of detecting whether a pdf has been modified unless it is signed.
https://community.adobe.com/t5/acrobat-reader/how-to-detect-a-modified-pdf-file/td-p/3546278