GemBox DocumentModel.Load() cannot read Pdf file

2.8k Views Asked by At

Currently i am unable to load original pdf document using GemBox. it gives me below error in image. and I am using Acrobat 9.

I have tried using 8/16/2018 fixes too. Any suggestion will be highly appreciated.

Basic Code i am using is,

using GemBox.Document;
using System;

namespace Pdf2Text
{
   class Program
   {

      [STAThread]
      static void Main(string[] args)
      {
          ComponentInfo.SetLicense("My-License");

          DocumentModel document = null;
          document = DocumentModel.Load(@"E:\data\testing\HA021.pdf");
          document.Save(@"E:\data\testing\HA021.docx");
      }
    }
}
1

There are 1 best solutions below

0
On BEST ANSWER

EDIT:

In the newer versions of GemBox.Document there is another PDF reader that is intended for high-fidelity tasks, see Convert PDF to Word.

Here is how to use it:

var document = DocumentModel.Load("Sample.pdf",
    new PdfLoadOptions() { LoadType = PdfLoadType.HighFidelity });
document.Save("Sample.docx");

ORIGINAL:

The current implementation of PDF reader in GemBox.Document is still in beta and cannot handle this PDF feature, "iref streams" which are cross-reference tables stored in streams.

However, GemBox.Pdf can handle cross-reference streams so as a workaround you could do something like the following:

// Load PDF with GemBox.Pdf.
var pdfDocument = PdfDocument.Load("Sample.pdf");
pdfDocument.SaveOptions.CrossReferenceType = PdfCrossReferenceType.Table;

// Save PDF with GemBox.Pdf.
var pdfStream = new MemoryStream();
pdfDocument.Save(pdfStream);

// Load PDF with GemBox.Document.
var document = DocumentModel.Load(pdfStream, LoadOptions.PdfDefault);

Last regarding the conversion of PDF to DOCX, GemBox.Document's PDF reader is currently intended for extracting text and tables from PDF files, it's not intended for any high fidelity requirement.