Handling large amounts of nested elements with libxml SAX parser

262 Views Asked by Toby Cronin At 27 July 2025 at 17:50

I am currently using the SAX interface of the libxml library to parse a large number (around 60000) of XML documents less than 1Mb in size. I have chosen SAX as I thought it would be the most efficient. Would there be much of a difference in performance in this use case as with say a DOM parser?

Also, in my current approach I have an enum with a large number of states which I use in a switch statement in my startElement/endElement handlers. The number of states is growing quite large and becoming unmanageable. Is there a better way to handle this problem in libxml? For example, I've noticed some Java libraries allow you to create multiple instances of parsers so when you enter a certain element you can delegate to another parser for that particular element.

Original Q&A

There are 1 best solutions below

Michael Kay On 17 November 2014 at 12:44 BEST ANSWER

When you say "efficient", I guess you are talking about machine efficiency? But programmer efficiency is much more important, and as you've discovered, writing SAX applications to process complex XML requires a lot of complex code that is hard to develop and hard to debug.

You haven't said what the output of your processing should be. By default, I would start by writing it in the most programmer-efficient language available, typically XQuery or XSLT, and only resort to a lower-level language if you can't achieve the performance requirements that way.

Handling large amounts of nested elements with libxml SAX parser

There are 1 best solutions below

Related Questions in C

Related Questions in XML

Related Questions in PARSING

Related Questions in SAX

Related Questions in LIBXML2

Trending Questions

Popular # Hahtags

Popular Questions