I am trying to have my Excel VBA code read an interactive PDF (i.e. a PDF with dropdowns, calendars...etc.). I normally read PDF files by opening them in Word; however, Word removes all the interactive controls while rendering the file (e.g. a calendar control gets completely removed...along with its value).
I know that I can "flatten" the PDF by opening it in Adobe Acrobat Reader (or any browser) and printing it to a PDF (thus replacing all the controls with their selected values). The question is how do I do this programmatically in VBA. I know there are thousands of questions on S.O. about this topic; but, all of the answers either require installing Adobe Acrobat Pro, using a third-party installed application (e.g. CutePDF), submitting the file to an online API, or most commonly, using SendKeys or Win32 API calls to interact with the "Save As" dialog box that appears. The code I am writing will be distributed to multiple users; so, any additional application installations is out of the question (both Acrobat Pro and other third-party conversion tools). The files contain proprietary data; so, an online API is out as well. As far as SendKeys or SendMessage is concerned...frankly, there's got to be a better way.
I've tried numerous methods including Chrome's headless "print-to-pdf"; but it appears that only accepts HTML files (I even tried embedding the PDF into an HTML file but it still did not work). The closest I've gotten is by using the below code to send data directly to the "Microsoft Print to PDF" driver (it's based on the process described here). The code successfully creates a PDF file...but it's a zero byte file. It seems that the data read from the input file is not being properly accepted by the "WritePrinter" function (although no error occurs). Although the documentation of the "WritePrinter" function states that the "pBuf" parameter should be a byte array, the person who asked this question was passing a string and getting it to work (understanding that they were not trying to print to a GDI printer). My code still just produces a blank PDF file if I convert the byte array to a string or even just read the contents of the input file using FSO's "ReadAll" method. It is also worth nothing that the return value of the "WritePrinter" function is "1" and the value of the "pcWritten" output is the correct number of bytes...it's just that the PDF file that is produced has a file size of 0 bytes.
So, can anyone figure out how to get the "WritePrinter" function to accept the data being read from the input file?
By the way, an enormous gold star to anyone who can figure this out because, based on my research, the internet is begging for a way to do this without using Acrobat Pro or having to interact with the "Save As" dialog!
UPDATE#1: I have found a few posts online where users are experiencing this issue of the "Microsoft Print to PDF" driver generating blank files manually (here and here). Apparently the primary culprit is special characters in the input file name. I wanted to make it clear that I do not believe this is the issue I am having, as I can print to PDF perfectly fine manually...just not via the code in this post. (Also, just so it's been said, neither the input file path/name nor output file path/name contains special characters). Also, I don't believe it's an issue with the specific install of the print driver (as suggested in some posts) as my code creates 0 byte files on 3 different computers with 3 different OS builds of Windows (18363.1082, 18363.1139, and 19041.572)
UPDATE#2: In my continued research of this issue I found this post on MSDN's Visual Studio help forums. I understand it's for C# but one of the contributers states:
So you should convert managed byte array into unmanaged array, then invoke the method
He provides C# code that uses the "Marshal.AllocCoTaskMem" and "Marshal.Copy" functions to "Copy the managed byte array into the unmanaged array". I'm not familiar with the terms "managed" or "unmanaged" byte arrays so I will continue to do some research on those. Does anyone have any experience with the "Marshal.AllocCoTaskMem" and "Marshal.Copy" functions from VBA within Excel (VB6)?
UPDATE#3: It has been brought to my attention that the code I've written will ONLY print XPS files to a PDF. I've converted my original interactive PDF to an XPS manually and confirmed that my code worked perfectly in writing that XPS to a PDF. This now leaves me back at square one: how do I programmatically read an interactive PDF without utilizing any third-party applications or online converters? Anyone have any ideas?
Type DOCINFO
lpszDocName As String
lpszOutput As String
lpszDatatype As String
End Type
Private Declare PtrSafe Function OpenPrinter Lib "winspool.drv" Alias "OpenPrinterA" (ByVal pPrinterName As String, phPrinter As Long, ByVal pDefault As Long) As Long
Private Declare PtrSafe Function StartDocPrinter Lib "winspool.drv" Alias "StartDocPrinterA" (ByVal hPrinter As Long, ByVal Level As Long, pDocInfo As DOCINFO) As Long
Private Declare PtrSafe Function StartPagePrinter Lib "winspool.drv" (ByVal hPrinter As Long) As Long
Private Declare PtrSafe Function WritePrinter Lib "winspool.drv" (ByVal hPrinter As Long, pBuf As Any, ByVal cdBuf As Long, pcWritten As Long) As Long
Private Declare PtrSafe Function EndPagePrinter Lib "winspool.drv" (ByVal hPrinter As Long) As Long
Private Declare PtrSafe Function EndDocPrinter Lib "winspool.drv" (ByVal hPrinter As Long) As Long
Private Declare PtrSafe Function ClosePrinter Lib "winspool.drv" (ByVal hPrinter As Long) As Long
Public Sub Print_File_To_PDF(strInputFile As String, strOutputPDF As String)
Dim udtDocInfo As DOCINFO
Dim lngPrinterCheck As Long, lngFileNumber As Long, lngPrinterHandle As Long, lngPrinterOutput as Long
Dim arrBytes() As Byte
'Get handle of printer
lngPrinterCheck = OpenPrinter("Microsoft Print To PDF", lngPrinterHandle, 0)
If lngPrinterCheck = 0 Then
Exit Sub
End If
'Define document info
With udtDocInfo
.lpszDocName = CreateObject("Scripting.FileSystemObject").GetBaseName(strOutputPDF)
.lpszOutput = strOutputPDF
.lpszDatatype = "RAW"
End With
'Read file into byte array
lngFileNumber = FreeFile
Open strInputFile For Binary Access Read As lngFileNumber
ReDim arrBytes(LOF(lngFileNumber))
Get lngFileNumber, , arrBytes()
Close lngFileNumber
'Print byte array to PDF file
Call StartDocPrinter(lngPrinterHandle, 1, udtDocInfo)
Call StartPagePrinter(lngPrinterHandle)
Call WritePrinter(lngPrinterHandle, arrBytes(1), UBound(arrBytes), lngPrinterOutput)
Call EndPagePrinter(lngPrinterHandle)
Call EndDocPrinter(lngPrinterHandle)
Call ClosePrinter(lngPrinterHandle)
End Sub