.Net C# PDF files comparison not working with any method

79 Views Asked by At

I need to compare two PDF files for equality. The two files need to be identical in content, and I'm not having any success with the proposals found on:

https://stackoverflow.com/a/36108862/2807741

public static bool AreFileContentsEqual(String path1, String path2) =>
              File.ReadAllBytes(path1).SequenceEqual(File.ReadAllBytes(path2));

and

https://stackoverflow.com/a/76917554/2807741

private bool AreFilesEqual(string file1Path, string file2Path)
{
    string file1Hash = "", file2Hash = "";
    SHA1 sha = new SHA1CryptoServiceProvider();

    using (FileStream fs = System.IO.File.OpenRead(file1Path))
    {
        byte[] hash;
        hash = sha.ComputeHash(fs);
        file1Hash = Convert.ToBase64String(hash);
    }

    using (FileStream fs = System.IO.File.OpenRead(file2Path))
    {
        byte[] hash;
        hash = sha.ComputeHash(fs);
        file2Hash = Convert.ToBase64String(hash);
    }

    return (file1Hash == file2Hash);
}

(among other links I've tried).

I'm comparing two "identical" files and they're always returning false (unless I compare a file with itself, only case where it returns true).

The way I created the files to compare is the next:

  1. Word > Write any content > "Save as" > PDF.
  2. Keep the content intact and "Save as" > PDF (with different name)

Maybe something is changing in the second file when saving even I'm not making any modifications to it?

file1.pdf:

file1.pdf

file2.pdf

file2.pdf

Edit 1:

When I say "Identical" I mean identical in content. The PDFs will contain amounts (numbers), and those amounts in the PDF bills must be exactly the same.

1

There are 1 best solutions below

1
On BEST ANSWER

Ok, I'll answer myself. iText7 is the way to go, as it can read PDF files content as text.

Nuget package: https://www.nuget.org/packages/itext7

public IActionResult Index()
{
    var exeFilePath = System.Reflection.Assembly.GetExecutingAssembly().Location;
    var workPath = $"{Path.GetDirectoryName(exeFilePath)}\\Assets";

    var file1 = $"{workPath}\\testpdfv1.pdf";
    var file2a = $"{workPath}\\testpdfv2equalv1.pdf";
    var file2b = $"{workPath}\\testpdfv2differentv1.pdf";

    var fileContents1 = PdfToText(file1);
    var fileContents2 = PdfToText(file2a);

    var filesAreEqual = fileContents1 == fileContents2;

    return View();
}

private string PdfToText(string pPdfFileInfo)
{
    var pdfFileInfo = new FileInfo(pPdfFileInfo);
    var pdfDocument = new PdfDocument(new PdfReader(pdfFileInfo.FullName));
    var strategy = new LocationTextExtractionStrategy();
    var result = "";
    for (int i = 1; i <= pdfDocument.GetNumberOfPages(); ++i)
    {
        var page = pdfDocument.GetPage(i);
        string text = PdfTextExtractor.GetTextFromPage(page, strategy);
        result += text;
    }
    pdfDocument.Close();

    return result;
}