Convert XML file to PDF using XSLT and CSS (with free software license)

8.6k Views Asked by At

I'm trying to convert XML report into PDF using XSLT and CSS. This PDF should include specific page numbering in the footer and e.g. page break after a particular table.

From what I found this can be achieved using CSS3 "at-attributes" for "paged media" (e.g. @page). However, if I understand correctly, I might have a problem finding a tool that interprets these attributes to create PDF (not to mention that it needs to converts from XML first).

I found that I could use paged.js script to have it working in a browser, but it works only if I run local server (e.g. live-server) because of some local file restriction in all web browsers. I can (kind of) overcome this using command line switches like --allow-file-access-from-files but it prints the document before rendering is complete (it looks like the browser do not wait for the paged.js script to finish). I tried different switches: chrome.exe --headless --disable-gpu --allow-file-access-from-files --run-all-compositor-stages-before-draw --virtual-time-budget=100000 --print-to-pdf="<destination>" "<source>". Perhaps some switches for node engine could help?

My question is how can I programmatically convert the XML file to PDF, using XSLT to extract data of interest from XML, and CSS to format PDF as the proper paged document, using software with a free commercial license? Do I need paged.js to accomplish it?

About my files: In my XML file, I reference local XSL files that extract particular data from XML; they remove duplicates and sort them by date. This XSL file reference local CSS file to provide nice formatting and also "paged" attributes. XSL reference also paged.js script and associated CSS, as in the script documentation.

I tried, between the other, weasyprint, htmldoc, html5-to-pdf and wkhtmltopdf without success.

I'm open for any suggestion.


EDIT: I was experimenting with XSL-FO (as suggested in the comment) and I have to admit that it works pretty well. It seems to me that page control is even more readable than in CSS. The only problem I see now is that it requires additional installations (Apache FOP and Java Runtime Environment). In my scenario it would be better to have FOP in .NET.

Anyway, I decided to describe in more detail my CSS-based solution with Chrome as a renderer because it does not requires additional installers (or at least that what I think). I already spend some time with it and it seems that it's almost working. Perhaps someone will spot where the problem is and then it would become a pretty nice solution for paged media with CSS.

Complete transform.xsl file is:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="/">
    <html>
        <head>
            <title>Summary</title>
            <link href="css/style.css" rel="stylesheet" type="text/css"/>
            <link href="css/interface.css" rel="stylesheet" type="text/css"/>
            <script src="js/paged.polyfill.js"/>
        </head>
        <body>
            <h1>Summary</h1>
            <table class="summary">
                <tr><td>Total Quantity:</td><td><xsl:value-of select="count(Results/Result)"/></td></tr>
                <tr><td>Passed:</td><td><xsl:value-of select="count(Results/Result[Status='Pass'])"/></td></tr>
                <tr><td>Failed:</td><td><xsl:value-of select="count(Results/Result[Status='Fail'])"/></td></tr>
            </table>
            <br />
            <xsl:apply-templates/>
        </body>
    </html>
</xsl:template>

    <xsl:template match="Results">
        <table class="results">
            <tr>
                <th>Serial Number</th>
                <th>Test Result</th>
                <th>Date</th>
                <th>Time</th>
            </tr>
            <xsl:for-each select="Result">
                <tr>
                    <xsl:attribute name="class"><xsl:value-of select="./Status"/></xsl:attribute>
                    <td><xsl:value-of select="./SerialNumber"/></td>
                    <td><xsl:value-of select="./Status"/></td>
                    <td><xsl:value-of select="./Date"/></td>
                    <td><xsl:value-of select="./Time"/></td>
                </tr>
            </xsl:for-each>
        </table>
    </xsl:template>

</xsl:stylesheet>

In my style.css I have @page rules as below. There is more in my original file but it's not important here.

@page {
    size: A4;
    margin: 2cm 1cm;

    @top-left {
        content: "Summary Continued...";
        font-size: 15px;
    }

    @bottom-center{
        content: "Page " counter(page) "/" counter(pages);
        font-size: 15px;
    }
}

@page :first {
    @top-left {
        content: "";
    }
}

And data is stored in Reports.xml (see below). In this xml file you can put as many Result fields as you want. I have 200 results in my file but I truncated it here to make it clearer.

<?xml version="1.0" encoding="iso-8859-1" ?><?xml-stylesheet type="text/xsl" href="xsl/transform.xsl"?>

<Results>
    <Result ID="0">
        <SerialNumber>8652280431</SerialNumber>
        <Status>Fail</Status>
        <Date>05-Mar-21</Date>
        <Time>08:56:23</Time>
    </Result>
    <Result ID="1">
        <SerialNumber>11124002643</SerialNumber>
        <Status>Fail</Status>
        <Date>05-Mar-21</Date>
        <Time>08:56:23</Time>
    </Result>
.
.
.
    <Result ID="200">
        <SerialNumber>6616001379</SerialNumber>
        <Status>Fail</Status>
        <Date>05-Mar-21</Date>
        <Time>08:56:23</Time>
    </Result>
</Results>

I also have in my files interface.css and paged.polyfill.js as described in the documentation (see transform.xsl).

When I open my Results.xml in Chrome with the command chrome.exe --allow-file-access-from-files <file path> it works as excepted (see image below).

image

When I try to print my Results.xml in Chrome with command chrome.exe --headless --disable-gpu --allow-file-access-from-files --run-all-compositor-stages-before-draw --virtual-time-budget=100000 --print-to-pdf=<destination> <source> it generates unexpected results. Only couple pages are generated and total page count (counter(pages)) returns 0 in the footer (see image below).

image

So maybe someone will figure how to make it work.

Perhaps --js-flags will do the trick? Or maybe I should add/change something in paged.polyfill.js?

0

There are 0 best solutions below