I have a fairly large XML file (~11MB) and I'm using TinyXPath to locate some nodes. Despite the fact that the node I'm trying to locate cannot be confused with other nodes later in the DOM, it is taking several minutes for the XPath query to return.
Here is my sample XML:
<RootElement>
<Header>
<Location>1234</Location>
... maybe a dozen sibling nodes
</Header>
<EventReport>
<SomeEvent>with a few dozen child nodes</SomeEvent>
... 2,000+ SomeEvent nodes
</EventReport>
</RootElement>
And here is my c++ code:
TiXmlDocument doc;
doc.LoadFile("C:\\Path\\To\\file.xml");
TiXmlNode *locationNode = TinyXPath::XNp_xpath_node(doc.RootElement(), "//RootElement/Header/Location");
From pausing and examining the stack trace, it looks like it is trying to parse and traverse the entire XML structure. However, RootElement
only has 2 children nodes: Header
and EventReport
. And since I'm not looking for anything under the (very large) EventReport
node, I would hope this query would be very quick.
Also, if I scale down the sample XML to only contain a few SomeEvent
nodes, then this query returns almost instantly.
Is this a known limitation with TinyXPath? Is there a better way to structure my query to return in a timely manner?
It's likely that the cost is not in evaluating the XPath, but in parsing the source document into a tree suitable for the XPath engine to work on. You say that RootElement has only 2 child nodes, but there is no way the XPath engine can know this until the document has been parsed. Having said that, there's no reason it should take minutes. One second per megabyte would be reasonable, anything more looks inefficient. However, I don't know the TinyXPath technology: perhaps it is optimized for size rather than speed?