I'm trying to transform a simple HTML page to XSL-FO, to feed into Apache FOP for PDF rendering. The steps are: HTML+CSS -> XHTML -> XSL-FO -> PDF.
I've used the java library CSSToXSLFO to transform XHTML to XSL-FO. This works, however it's incapable of handling embedded images.
Are there any tools to transform
<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>hello</title>
</head>
<body>
<h1 style="color: green">Hello world!</h1>
<img src="data:image/png;base64,iVBORw...=" />
</body>
</html>
into
<fo:flow flow-name="xsl-region-body">
<fo:block>
<fo:block color="green">Hello world!</fo:block>
<fo:external-graphic src="url(data:image/png;base64,iVBORw...=)" content-height="scale-to-fit" content-width="scale-to-fit" scaling="uniform"/>
</fo:block>
</fo:flow>
?
If the FOP processor supports data URIs in
fo:external-graphic
you can of course use XSLT to transform XHTML to XSL-FO with e.g.That is a minimal example to handle the
h1
and theimg
element, I haven't tried to spell out any HTML CSSstyle
attribute to XSL-FO presentational attribute transformation but you can of course use e.g.<xsl:apply-templates select="@*, node()"/>
instead of<xsl:apply-templates/>
and then add templates to transform e.g.style="color: green"
tocolor="green"
. As CSS has its own, non-XML syntax, obviously writing a full parser for arbitrary style attributes is a demanding task beyond the scope of StackOverflow answers.I am also not quite sure about the allowed src attribute syntax in XSL-FO, FOP seems to understand the direct
src="{@src}"
just fine, but of course, to create the format you indicated in your question, you could as well usesrc="url({@src})"
.