How to estimate memory need by XPathDocument for a specific xml file

910 Views Asked by At

Is there any way to estimate the memory requirement for creating an XpathDocument instance based on the file size of the xml?

XpathDocument xdoc = new XpathDocument(xmlfile);

Is there any way to programmatically stop the process of creating the XpathDocument if memory drops to a very low level?

Since it loads the entire xml into memory, it would be nice to know ahead of time if the xml is too big. What I have found is that when I create a new XpathDocument with a big xml file, an outofmemory exception is never fired, but that the process slows to a crawl, only 5 Mb of memory remains a available and the Task Manager reports it is not responding. This happened with a 266 Mb xml file when there was 584 Mb of ram. I was able to load a 150 Mb file with no problems in 18.

After loading the xml, I want to do xpath queries using an XpathNavigator and an XpathNodeIterator. I am using .net 2.0, xp sp3.

3

There are 3 best solutions below

4
On

You can simply check the file size and back out if it exceeds a certain upper bound.

var xmlFileInfo = new FileInfo(xmlfile);
var isTooBig = xmlFileInfo.Length > maximumSize

This will not be foolproof, because you cannot guess at what the correct maximum size will be.

0
On

Yes sure you can do it with FileInfo class.

System.IO.FileInfo foo = new System.IO.FileInfo("<your file path as string>"); 
long Size = foo.Length;
0
On

In short, no you cannot, except if you always have similar files to gather statictical data before starting the estimations.

Since tag, attribute, prefix and namespace strings are interned, it pretty much depends on the structure of the XML file how efficient the storage can be, and the ratio compared to the file on disk also depends on the encoding used.

In general, .NET stores any string as UTF16 in memory. Therefore, even if there was no significant structural overhead (imagine an XML file with only a single root tag and lots of plain text in it), the memory used would still double for a UTF8 source file (or also ASCII or any other 8-bit encoding) used. So string encoding is the first part in the equation.

The other thing is that a data structure is built in-memory to allow the efficient traversal of the document. Typically, nodes are constructed and linked together with references. Therefore each node uses up a certain amount of memory; since most non-value data are references, the memory used here also depends heavily on the architecture (64-bit uses twice as much memory for a single reference than a 32-bit system). So if you have a very complex document with little data (e.g. a whole bunch of few different tags with little text or attribute values) your memory usage will be much higher than the original document size, and at this will also depend a lot on the architecture your application runs on.

If you have a file with few very long tag and attribute names and maybe heavy default namespace useage, the memory used may also be much lower than the file on disk.

So assuming an arbitrary XML file with an unknown encoding, a reasonable amount of data and complexity it will be very difficult to get a reliable estimation. However, if your XML files are always similar in the points mentionned, you could create some statistics to get a factor which gets the ratio about right for your specific platform.

However, note that looking at "free memory" in the task manager or talking of a "very low memory level" are very vague quantifications. Virtual memory, caches, background applications and services etc. will influence the effective raw memory availability. The .NET Framework can therefore not reliably guess how much memory it should allow to be used to remain performant for a single process, or even before throwing an OutOfMemoryException safely. So if you get one of those exceptions, you are usually way beyond a possible recovery point for your application, and you should not try to catch and handle those exceptions.