Fast way to get number of elements in a xml document

6.2k Views Asked by At

is there a best practice to get the number of elements from an XML document for progress reporting purposes? I have an 2 GB XML file containing flights which I need to process and my idea is to first get the number of all elements in the file and then use a counter to show x of x flights are imported to our database.

For the file processing we are using the XmlTextReader in .NET (C#) to get the data without reading the whole document into memory (similiar to sax parsing).

So the question is, how can I get the number of those elements very quick... is there a best practice or should I go through the whole document first and doe something like i++; ?

Thanks!

2

There are 2 best solutions below

2
On
int count = 0;
using (XmlReader xmlReader = new XmlTextReader(new StringReader(text)))
{
    while (xmlReader.Read())
    {
        if (xmlReader.NodeType == XmlNodeType.Element &&
            xmlReader.Name.Equals("Flight"))
            count++;
    }
}
0
On

You certainly can just read the document twice - once to simply count the elements (keep using XmlReader.ReadToFollowing for example, (or possibly ReadToNextSibling) increasing a counter as you go:

int count = 0;
while (reader.ReadToFollowing(name))
{
    count++;
}

However, that does mean reading the file twice...

An alternative is to find the length of the file, and as you read through the file once, report the percentage of the file processed so far, based on the position of the underlying stream. This will be less accurate, but far more efficient. You'll need to create the XmlReader directly from a Stream so that you can keep checking the position though.