Recreate pdf with Acrobat

689 Views Asked by At

I try to extract the text of a pdf via iTextSharp.text.pdf.parser.PdfTextExtractor.GetTextFromPage, which does not work because of some bad formatting of the pdf file with respect to an inline picture.

I figured out that I can fix this problem, if I (A) open the pdf in Adobe Acrobat and save it as an optimized pdf. Then the parsing would work. Or (B) I would open it in Adobe Acrobat and print it again via Adobe PDF as pdf.

Now I have 14.000 of these files and want to automate (A) or (B). But somehow I cannot succeed.

For (A) I included the Adobe library and do in short something like this

mApp = new AcroAppClass();
avDoc = new AcroAVDocClass();
avDoc.Open (strFilePath, "");
pdDoc  = (CAcroPDDoc)avDoc.GetPDDoc ();
pdDoc.Save(1, strFilePath.Substring(0, strFilePath.Length - 4) + "_changed.pdf");

But Adobe SDK does not allow me to save as a different format.

For (B) it tried something like this:

Process pdfProcess = new Process();
pdfProcess.StartInfo.FileName = @"C:\Program Files (x86)\Adobe\Acrobat 11.0\Acrobat\AcroRd32.exe";
pdfProcess.StartInfo.Arguments = string.Format(@"/t", strFilePathSource, "Adobe PDF", "Adobe PDF", strFilePathTarget);
pdfProcess.Start();

This is not throwing any error, but there is also no file produced.

0

There are 0 best solutions below