Performance issues while searching for XPath using TinyXPath

270 Views Asked by At

I have a fairly large XML file (~11MB) and I'm using TinyXPath to locate some nodes. Despite the fact that the node I'm trying to locate cannot be confused with other nodes later in the DOM, it is taking several minutes for the XPath query to return.

Here is my sample XML:

<RootElement>
  <Header>
    <Location>1234</Location>
    ... maybe a dozen sibling nodes
  </Header>
  <EventReport>
    <SomeEvent>with a few dozen child nodes</SomeEvent>
      ... 2,000+ SomeEvent nodes
  </EventReport>
</RootElement>

And here is my c++ code:

TiXmlDocument doc;
doc.LoadFile("C:\\Path\\To\\file.xml");
TiXmlNode *locationNode = TinyXPath::XNp_xpath_node(doc.RootElement(), "//RootElement/Header/Location");

From pausing and examining the stack trace, it looks like it is trying to parse and traverse the entire XML structure. However, RootElement only has 2 children nodes: Header and EventReport. And since I'm not looking for anything under the (very large) EventReport node, I would hope this query would be very quick.

Also, if I scale down the sample XML to only contain a few SomeEvent nodes, then this query returns almost instantly.

Is this a known limitation with TinyXPath? Is there a better way to structure my query to return in a timely manner?

1

There are 1 best solutions below

0
On

It's likely that the cost is not in evaluating the XPath, but in parsing the source document into a tree suitable for the XPath engine to work on. You say that RootElement has only 2 child nodes, but there is no way the XPath engine can know this until the document has been parsed. Having said that, there's no reason it should take minutes. One second per megabyte would be reasonable, anything more looks inefficient. However, I don't know the TinyXPath technology: perhaps it is optimized for size rather than speed?