I need to compare two PDF files for equality. The two files need to be identical in content, and I'm not having any success with the proposals found on:
https://stackoverflow.com/a/36108862/2807741
public static bool AreFileContentsEqual(String path1, String path2) =>
File.ReadAllBytes(path1).SequenceEqual(File.ReadAllBytes(path2));
and
https://stackoverflow.com/a/76917554/2807741
private bool AreFilesEqual(string file1Path, string file2Path)
{
string file1Hash = "", file2Hash = "";
SHA1 sha = new SHA1CryptoServiceProvider();
using (FileStream fs = System.IO.File.OpenRead(file1Path))
{
byte[] hash;
hash = sha.ComputeHash(fs);
file1Hash = Convert.ToBase64String(hash);
}
using (FileStream fs = System.IO.File.OpenRead(file2Path))
{
byte[] hash;
hash = sha.ComputeHash(fs);
file2Hash = Convert.ToBase64String(hash);
}
return (file1Hash == file2Hash);
}
(among other links I've tried).
I'm comparing two "identical" files and they're always returning false (unless I compare a file with itself, only case where it returns true).
The way I created the files to compare is the next:
- Word > Write any content > "Save as" > PDF.
- Keep the content intact and "Save as" > PDF (with different name)
Maybe something is changing in the second file when saving even I'm not making any modifications to it?
file1.pdf:
file2.pdf
Edit 1:
When I say "Identical" I mean identical in content. The PDFs will contain amounts (numbers), and those amounts in the PDF bills must be exactly the same.
Ok, I'll answer myself. iText7 is the way to go, as it can read PDF files content as text.
Nuget package: https://www.nuget.org/packages/itext7