XML file doesn't get attached into the PDF while using ghostscript

90 Views Asked by At

My goal is to covert a pdf into a file that fits the factur-x format.

I successfully converted a pdf into pdfA/3-b Here's the code:

import subprocess

gs_path = r"C:\Program Files\gs\gs10.02.1\bin\gswin64.exe"

def convert_to_pdfa(input_path, output_path, pdfa_def_path):
    command = [
        gs_path,
        "-dPDFA=3",
        "-dBATCH",
        "-dNOPAUSE",
        "-sColorConversionStrategy=UseDeviceIndependentColor",
        "-sDEVICE=pdfwrite",
        "-sOutputFile=" + output_path,
        "-dPDFACompatibilityPolicy=2",
        pdfa_def_path,
        input_path
    ]

    subprocess.run(command)

if __name__ == "__main__":
    input_pdf_path = "facture.pdf"
    output_pdfa_path = "output_pdfa.pdf"
    pdfa_def_path = "PDFA_def.ps"

    convert_to_pdfa(input_pdf_path, output_pdfa_path, pdfa_def_path)

Here's the code in the PDFA_def.ps file:

% Define entries in the document Info dictionary :

/ICCProfile (sRGB_v4_ICC_preference.icc)
def

[ /Title (test)
/DOCINFO pdfmark

% Define an ICC profile :

[/_objdef {icc_PDFA} /type /stream /OBJ pdfmark
[{icc_PDFA} <</N systemdict /ProcessColorModel get /DeviceGray eq {1} {4} ifelse >> /PUT pdfmark
[{icc_PDFA} ICCProfile (r) file /PUT pdfmark

% Define the output intent dictionary :

[/_objdef {OutputIntent_PDFA} /type /dict /OBJ pdfmark
[{OutputIntent_PDFA} <<
/Type /OutputIntent             % Must be so (the standard requires).
/S /GTS_PDFA1                   % Must be so (the standard requires).
/DestOutputProfile {icc_PDFA}            % Must be so (see above).
/OutputConditionIdentifier (sRGBv4 ICC preference)

/PUT pdfmark

% Embed XML file:
[ /_objdef {InvoiceStream} /type /stream /OBJ pdfmark
[ {InvoiceStream} << /Type /EmbeddedFile /Subtype (text/xml) cvn /Params << /ModDate (D:20130121081433+01’00’) >> >> /PUT pdfmark
[ {InvoiceStream} (output.xml) (r) file /PUT pdfmark
[ {InvoiceStream} /CLOSE pdfmark
[ /_objdef {Invoice_FSDict} /type /dict /OBJ pdfmark
[ {Invoice_FSDict} << /Type /FileSpec /F (output.xml) /UF (output.xml) /Desc (ZUGFeRD XML invoice) /AFRelationship /Alternative /EF << /F {InvoiceStream} /UF {InvoiceStream} >> >> /PUT pdfmark
[ /_objdef {AFArray} /type /array /OBJ pdfmark
[ {AFArray} {FSDict} /APPEND pdfmark
[ {Catalog} << /AF {AFArray} >> /PUT pdfmark
[ /Name (output.xml) /FS {FSDict} /EMBED pdfmark
[
/XML
(
...
)
/Ext_Metadata pdfmark

I followed this tutorial on the zugferd blog

When I open the pdf, there's no attached xml file: There is no xml files attached

I compared the pdf I rendered with a pdf that follows the factur-x format

the pdf I rendered:

46 0 obj
<</Type/Metadata
/Subtype/XML/Length 1294>>stream
<?xpacket begin='' id='W5M0MpCehiHzreSzNTczkc9d'?>
<?adobe-xap-filters esc="CRLF"?>
<x:xmpmeta xmlns:x='adobe:ns:meta/' x:xmptk='XMP toolkit 2.9.1-13, framework 1.6'>
...
</x:xmpmeta>                                                                                                                                      
<?xpacket end='w'?>
endstream
endobj

valid pdf:

8 0 obj
<<
/Filter /FlateDecode
/Subtype /XML
/Type /Metadata
/Length 978
>>
stream
    ... binary data ...
endstream
endobj
2

There are 2 best solutions below

0
Hermann12 On

I see not that your subprocess followed the command.ine description of ghostwriter -> here:

gs --permit-file-read=/usr/home/me/zugferd/ -sDEVICE=pdfwrite -dPDFA=3\
-sColorConversionStrategy=RGB -sZUGFeRDXMLFile=/usr/home/me/zugferd/invoice.xml\
-sZUGFeRDProfile=/usr/home/me/rgb.icc -sZUGFeRDVersion=2p1 -sZUGFeRDConformanceLevel=BASIC\
-o /usr/home/me/zugferd/zugferd.pdf\
/usr/home/me/zugferd/zugferd.ps /usr/home/me/zugferd/original.pdf

There are also factur-x python libraries on PyPi.

0
K J On

For Windows users struggling to ensure the syntax is working for them use this as a template command then adapt slowly until finally working when you can add the -q (if desired).

You need from the installed GS files a copy of

  • zugferd.ps
  • rgb.icc and or cmyk.icc (Ensure it is the correct type for your needs)
  • a pair of source.pdf and source.xml (here I call them invoice-0001)

The result should be invoice-0001-xml.pdf and the stated size 0 bytes as not a pdf.

enter image description here

gswin##c --permit-file-read="%CD%/" -sDEVICE=pdfwrite -dPDFA=3 -sColorConversionStrategy=RGB -sZUGFeRDProfile="%CD%\rgb.icc" -sZUGFeRDVersion=2p1 -sZUGFeRDConformanceLevel=BASIC -sZUGFeRDXMLFile="%CD%\invoice-0001.xml" -o"%CD%\invoice-0001-xml.pdf" zugferd.ps "%CD%\invoice-0001.pdf"

NOTES

  • gswin##c Will be the correct installed .exe for your system or user environmental paths where ## is either 32 or 64
  • "%CD%/" The Current work Directory where all the InOut files are suggested to be together (while testing as you can replace %CD% after testing) and beware only for the permissions it MUST be forward slash terminated !
  • If using CMYK colour inks then the RGB and rbg values both need the correct CMYK.icc profile

enter image description here

Once you trust a zero Length file it can run anything suitable.

enter image description here

So the file will usually run as a File in Edge.

enter image description here