I really read and tested a lot, but I don't get a working java-solution:
I have a large xml-file (more than 100MB) which is processed via JAXB by now. The aim is to split the xml into many xmls using one child of root every time.
Important: Because of the filesize, a sax-way is preferred.
I found a lot of information about xsl:result-document, but I found no way to get it running from java and I am quite not sure, if it would be possible to keep needed memory low.
This is my Java-Code:
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;
public class TestParse {
public static void main(final String[] args) throws Throwable {
final TransformerFactory factory = TransformerFactory.newInstance();
final Transformer transformer = factory.newTransformer(new StreamSource("D:\\split.xsl"));
final StreamSource in = new StreamSource("D:\\input.xml");
final StreamResult out = new StreamResult("D:\\output.xml");
transformer.transform(in, out);
}
This is an example-xml ("input.xml"):
<?xml version="1.0" encoding="ISO-8859-1"?>
<Taskname>
<Item attr="ab" attr2="c">
<MoreNodes>...</MoreNodes>
</Item>
<Item attr="xy" attr2="z">
<MoreNodes>...</MoreNodes>
</Item>
<!-- ...and many items more -->
</Taskname>
This is my xsl (split.xsl):
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:strip-space elements="*"/>
<xsl:param name="dir" select="'file:///D://'"/>
<xsl:template match="Item">
<xsl:result-document href="{$dir}section{position()}.xml" method="xml">
<Taskname>
<xsl:copy-of select="." />
</Taskname>
</xsl:result-document>
</xsl:template>
</xsl:stylesheet>
So one result-xml should look like this:
<?xml version="1.0" encoding="ISO-8859-1"?>
<Taskname>
<Item attr="..." attr2="...">
<MoreNodes>...</MoreNodes>
</Item>
</Taskname>
My problem:
I really don't now, how I could get the different outputs of the xslt and more than that, I would need them as Streams an not as Files - and I would need them item by item (like sax' endElement) to use less memory.
Maybe, there is an other, better way than to use xslt, than, please just tell me.
Firstly, if you want to avoid building a tree for the source document in memory, then you're going to have to run this with XSLT 3.0 streaming - which means you need a Saxon-EE license. (However, it's quite feasible to process a 100Mb file the traditional way, with a tree in memory).
Secondly, if you want the output of xsl:result-document to be captured as in-memory streams rather than being written to filestore, then in Saxon the way to achieve this is to write and register an OutputURIResolver. This will be called once for each result document, and can specify a destination (such as a StreamResult or SAXResult) to receive the document.