XML API for best performance

1.2k Views Asked by At

I have an application that works with a lot of XML data. So, I want to ask you which is the best API to handle XML in java. Today, I'm using W3 and, for performance, I want to migrate to some API. I make XML from 0, a lot of transforms, import into database (mysql, mssql, etc), export from database to html, modifi of those XML, and more.

Is JDOM the best option? do you know some other better than JDOM? I heard (by reading pages) about javolution. Somebody use it?

Which API you recommend me?

3

There are 3 best solutions below

0
On BEST ANSWER

If you have vast amounts of data, the main thing is to avoid having to load it all into memory at once (because it will use a vast amount of memory, and because it prevents you overlapping IO and processing). Sadly, i believe most DOM and DOM-like libraries (like DOM4J) do just that, so they are not well suited for processing vast amounts of XML efficiently.

Instead, look at using a streaming API, like SAX or StAX. StAX is, in my experience, usually easier to use.

There are other APIs that try to give you the convenience of DOM with the performance of SAX. Javolution might be one; VTD-XML is another. But to be honest, i find StAX quite easy to work with - it's basically a fancy stream, so you just think in the same way as if you were reading a text file from a stream.

One thing you might try is combining JAXB with StAX. The idea is that you stream the file using StAX, then use JAXB to unmarshal chunks within it. For instance, if you were processing an Atom feed, you could open it, read past the header, then work in a loop unmarshalling entry elements to objects one at a time. This only really works if your format consists of a sequence of independent elements, like Atom; it would be largely useless on something richer like XHTML. You can see examples of this in the JAXB reference implementation and a guy's blog post.

0
On

Well, the most developers I know and myself, we use dom4J, maybe if you have the time you could write a small performancetest with use of both frameworks, then you will see the difference. I prefere dom4j.

0
On

The answer depends on what performance aspects are important for your application. One factor is whether you are handling large XML documents.

For parsing, DOM-based approaches will not scale well to large documents. If you need to parse large documents, non-DOM parsers such as those using SAX and StAX will be faster and less resource intensive. However, if you need to transform XML after parsing, using either XSL or a DOM API, you are going to need the whole document in memory in any case.

For creating XML from code, StAX provides a nice API for this. Since the approach is stream-based, this will scale well to writing very large documents.