python fillable pdf - space above field values when built from pypdf2, but not from pdftk

365 Views Asked by At

This is the workflow we've been using to generate a fillable pdf, then fill it in python. The filling process requires installation of pdftk, and we'd like to get rid of that dependency.

  1. generate .odt in LibreOffice Writer (7.3.4.2), with form fields ('controls')
  2. export as .pdf (LibreOffice can export a fillable pdf directly, but apparently with Word you would need to then do some post-processing in Acrobat, which we don't have, to turn it into a fillable pdf?)
  3. in python, generate .fdf with the filled values (using fdfgen)
  4. in python, call the external pdftk tool to merge the fdf with the fillable pdf, and generate a flattened (non-editable) .pdf

In other words, we'd like to not require pdftk or any other external tool, removing step 4 above (and potentially step 3 if .fdf is no longer needed).

We've tried python modules pypdf2, pdfrw (actually pdfrw2), and fillpdf. pypdf2 and pdfrw both result in a vertical space above and/or below the filled field value, which is a problem, because it reduces available vertical space on the form since the field backgrounds are opaque and we don't want to cover up the field labels. (The fields are shown in gray background with borders below to illustrate the problem; normally they are white with no border.) fillpdf doesn not seem to know that it's a multi-line text field, and generates a clipped single-line result.

The pdftk method does not result in the problematic added vertical space.

Is there a way to either a) get rid of this added vertical space in the LibreOffice PDF export, or b) get rid of the added vertical space in the filled pdf using a python-only solution, or c) set the field backgrounds to transparent, so that we can safely move the entire fields upwards, 'overlapping' but not blocking out the field labels?

Here's the document being edited in LibreOffice - notice, no vertical space above the default value: enter image description here

Here's the fillable .pdf exported from LibreOffice - notice this does have the added vertical space, so, one might think the issue lies with LibreOffice PDF Export, except that the space does not appear with the pdftk method below: enter image description here

Filled pdf, generated by pypdf2 (looks the same when generated from pdfrw):

enter image description here

Filled pdf, with the current workflow as above, using fdfgen and external call to pdftk: enter image description here

For reference, here's the code I used to make the various filled pdf versions:

import json
import os

cluePdfName='filled.pdf'
clueFdfName='filled.fdf'
fillableClueReportPdfFileName='clueReportFillable.pdf'

fields={
    'titleField':'SEARCH AND RESCUE\nSEARCH AND RESCUE\nSEARCH AND RESCUE SEARCH AND RESCUE SEARCH AND RESCUE ',
    'incidentNameField':'incident',
    'instructionsCollectField':True,
    'instructionsOtherField':True,
    'descriptionField':'long description long description long description long description long description long description long description long description long description long description long description long description long description',
    'locationField':'long description long description long description long description long description long description long description long description long description long description long description long description long description',
    'locationRadioGPSField':'(Radio GPS: 12345  67890)',
    'instructionsOtherTextField':'do some stuff'}

################################################
#  PART 1: pypdf2
################################################

from PyPDF2 import PdfReader,PdfWriter
from PyPDF2.generic import NameObject,TextStringObject,NumberObject,BooleanObject

reader=PdfReader(fillableClueReportPdfFileName)
writer=PdfWriter()
page=reader.pages[0]
# print(str(page))
# print('annots:'+json.dumps(page['/Annots'],indent=3))
pdfFields=reader.get_fields()
# print('fields:'+json.dumps(pdfFields,indent=3))
writer.add_page(page)

# override PdfWriter.update_page_form_field_values
#  based on https://stackoverflow.com/a/48412434/3577105
# - fill text fields and boolean (checkbox '/Btn' fields)
# - set /AS to the same value, to address not-visible-until-clicked issues
# - set readonly flag for all fields afterwards
for j in range(0, len(page['/Annots'])):
    writer_annot = page['/Annots'][j].getObject()
    for field in fields:
        if writer_annot.get('/T') == field:
            val=fields[field]
            valObj=TextStringObject('---')
            className=val.__class__.__name__
            if className=='str':
                valObj=TextStringObject(val)
            elif className=='bool':
                # checkboxes want a NameObject, either /Yes or /Off - seems odd but it works
                if val:
                    valObj=NameObject('/Yes')
                else:
                    valObj=NameObject('/Off')
            elif className in ['int','float']:
                valObj=TextStringObject(str(val))
            # print('updating '+str(field)+' --> '+str(fields[field])+' ['+className+':'+str(valObj)+']')
            print('updating '+str(field)+' --> '+str(valObj))
            writer_annot.update({
                NameObject("/V"): valObj,
                # NameObject("/AS"): valObj
                # NameObject('/Ff'): NumberObject(1) # set readonly flag for this field
            })
    ff=writer_annot.get('/Ff')
    if ff: # ff will not exist for all fields
        newff=ff|1 # set readonly flag for this field, without changing the other bits
        print('Ff: '+str(ff)+' --> '+str(newff))
        writer_annot.update({NameObject('/Ff'): NumberObject(newff)})
    else:
        writer_annot.update({NameObject('/Ff'): NumberObject(1)})

            
# writer.add_page(page)

with open(cluePdfName,'wb') as out:
    writer.write(out)


################################################
#  PART 2: fdfgen+pdftk
################################################

from fdfgen import forge_fdf

fdf=forge_fdf("",fields.items(),[],[],[])
fdf_file=open(clueFdfName,"wb")
fdf_file.write(fdf)
fdf_file.close()

cluePdfTkName=cluePdfName.replace('.pdf','_pdftk.pdf')
pdftk_cmd='pdftk "'+fillableClueReportPdfFileName+'" fill_form "'+clueFdfName+'" output "'+cluePdfTkName+'" flatten'
print("Calling pdftk with the following command:")
print(pdftk_cmd)
os.system(pdftk_cmd)


################################################
#  PART 3: pdfrw
# from https://akdux.com/python/2020/10/31/python-fill-pdf-files.html
# apparently, after doing pip install pdfrw2, 'import pdfrw' uses pdfrw2 (v0.5)
################################################

import pdfrw

ANNOT_KEY = '/Annots'
ANNOT_FIELD_KEY = '/T'
ANNOT_VAL_KEY = '/V'
ANNOT_RECT_KEY = '/Rect'
SUBTYPE_KEY = '/Subtype'
WIDGET_SUBTYPE_KEY = '/Widget'

def fill_pdf(input_pdf_path, output_pdf_path, data_dict):
    template_pdf = pdfrw.PdfReader(input_pdf_path)
    for page in template_pdf.pages:
        annotations = page[ANNOT_KEY]
        for annotation in annotations:
            if annotation[SUBTYPE_KEY] == WIDGET_SUBTYPE_KEY:
                if annotation[ANNOT_FIELD_KEY]:
                    key = annotation[ANNOT_FIELD_KEY][1:-1]
                    if key in data_dict.keys():
                        if type(data_dict[key]) == bool:
                            if data_dict[key] == True:
                                annotation.update(pdfrw.PdfDict(
                                    AS=pdfrw.PdfName('Yes')))
                        else:
                            annotation.update(
                                pdfrw.PdfDict(V='{}'.format(data_dict[key]))
                            )
                            annotation.update(pdfrw.PdfDict(AP=''))
    template_pdf.Root.AcroForm.update(pdfrw.PdfDict(NeedAppearances=pdfrw.PdfObject('true')))
    pdfrw.PdfWriter().write(output_pdf_path, template_pdf)

fill_pdf(fillableClueReportPdfFileName,cluePdfName.replace('.pdf','_pdfrw.pdf'),fields)


################################################
#  PART 4: fillpdf
################################################
import fillpdf
from fillpdf import fillpdfs

fillpdfs.write_fillable_pdf(fillableClueReportPdfFileName,cluePdfName.replace('.pdf','_fillpdf.pdf'),fields,flatten=True)
0

There are 0 best solutions below