Handling large amounts of nested elements with libxml SAX parser

285 Views Asked by At

I am currently using the SAX interface of the libxml library to parse a large number (around 60000) of XML documents less than 1Mb in size. I have chosen SAX as I thought it would be the most efficient. Would there be much of a difference in performance in this use case as with say a DOM parser?

Also, in my current approach I have an enum with a large number of states which I use in a switch statement in my startElement/endElement handlers. The number of states is growing quite large and becoming unmanageable. Is there a better way to handle this problem in libxml? For example, I've noticed some Java libraries allow you to create multiple instances of parsers so when you enter a certain element you can delegate to another parser for that particular element.

1

There are 1 best solutions below

0
On BEST ANSWER

When you say "efficient", I guess you are talking about machine efficiency? But programmer efficiency is much more important, and as you've discovered, writing SAX applications to process complex XML requires a lot of complex code that is hard to develop and hard to debug.

You haven't said what the output of your processing should be. By default, I would start by writing it in the most programmer-efficient language available, typically XQuery or XSLT, and only resort to a lower-level language if you can't achieve the performance requirements that way.