Why can't I add the text when I convert my HTML file to PDF?

113 Views Asked by At

My goal is to convert my PDF file to HTML. Then converts HTML back to PDF. When I do this on a normal file which is not secure and without a password, it works perfectly. But when I do this on a secure file and without password it doesn't work. What should I do? And how can you help me fix this bug?

Here's my code that converts PDF to HTML and HTML to PDF.

<?php
$fileName = 'ar-11'; // Replace with your PDF file path
$fileNameLower = strtolower($fileName); // Replace with your PDF file path
$pdfFilePath = '/opt/lampp/htdocs/'.$fileNameLower.'.pdf'; // Replace with your PDF file path
$outputHtmlPath = $fileNameLower.'.html'; // Replace with the desired HTML output path

// Create a temporary file for the unlocked PDF
$unlockedPdf = tempnam(sys_get_temp_dir(), 'unlocked_pdf_');

// // Use qpdf to remove password protection and encryption
$qpdfDecryptCommand = "qpdf --decrypt {$pdfFilePath} {$unlockedPdf}";
shell_exec($qpdfDecryptCommand);

// Use pdf2htmlEX to convert the unlocked PDF to HTML
$command = "pdf2htmlEX --process-outline 0 --fit-width 1024 --space-as-offset 1 {$unlockedPdf} {$outputHtmlPath}";
shell_exec($command);

// Clean up temporary files if necessary
unlink($unlockedPdf);

// Add a delay to ensure the HTML is generated before converting to PDF
sleep(2);

// Create a PDF from the translated HTML using wkhtmltopdf
$pdfOutputPath = $fileNameLower . '-translated.pdf';

// Add the --enable-local-file-access option to enable access to local files (HTML)
$wkhtmltopdfCommand = "wkhtmltopdf --enable-local-file-access {$outputHtmlPath} {$pdfOutputPath}";
shell_exec($wkhtmltopdfCommand);

echo 'Translation completed and saved as HTML and PDF.';

Here's the image of the content of the original PDF "ar-11.pdf" that I use: File PDF Original

Here's the image of the PDF I converted from HTML to PDF that doesn't want to add the text: HTML to PDF

Any advice would be appreciated.

1

There are 1 best solutions below

0
K J On

XFA forms are a specialist area so need dedicated handling (and usually not free). The Best overview of the issues is https://www.datalogics.com/access-xfa-forms-with-forms-flattener

There are good Web/Server based systems that work at transferring the XML into workable HTML, here is one such example with custom colours, which has a server command line ability, (there are others, like Adobe (LiveCycle ES4), Apryse iText https://kb.itextsupport.com/home/it7kb/faq/how-to-fill-xfa-form-using-itext-without-breaking-usage-rights, Aspose, Foxit, etc. ).

Convert all Forms
FormVu supports converting both AcroForms and XFA Forms. All forms are converted into HTML5/CSS and JavaScript preserving style and layout.

enter image description here