Try-Catch statement ends While loop reading through an XML file in C#

2.2k Views Asked by At

I have a while loop going through an XML file, and for one of the nodes "url", there are sometimes invalid values within it. I put a try-catch statement around this to catch any invalid values. The problem is, whenever an invalid value is grabbed the while loop is killed and the program continues on outside of that loop. I need the while loop to continue reading through the rest of the XML file after an invalid value if found.

Here is my code:

        XmlTextReader reader = new XmlTextReader(fileName);
        int tempInt;

        while (reader.Read())
        {
            switch (reader.Name)
            {
                case "url":
                    try
                    {
                        reader.Read();
                        if (!reader.Value.Equals("\r\n"))
                        {
                            urlList.Add(reader.Value);
                        }
                    }
                    catch
                    {                            
                        invalidUrls.Add(urlList.Count);   
                    }
                    break;
            }
        }

I chose not to include the rest of the switch statement as it is not relevant. Here is a sample of my XML:

<?xml version="1.0"  encoding="ISO-8859-1" ?>
<visited_links_list>
    <item>
        <url>http://www.grcc.edu/error.cfm</url>
        <title>Grand Rapids Community College</title>
        <hits>20</hits>
        <modified_date>10/16/2012 12:22:37 PM</modified_date>
        <expiration_date>11/11/2012 12:22:38 PM</expiration_date>
        <user_name>testuser</user_name>
        <subfolder></subfolder>
        <low_folder>No</low_folder>
        <file_position>834816</file_position>
     </item>
</visited_links_list>

The exception I get throughout the code is similar to the following:

"' ', hexadecimal value 0x05, is an invalid character. Line 3887, position 13."

4

There are 4 best solutions below

2
On BEST ANSWER

Observation:

You're calling reader.Read() twice for each entry. Once in while(), and once within the case. Do you really mean to skip records? This will cause an exception if there are an odd number of entries in the source XML (since reader.Read() advances the pointer within the XML stream to the next item), but that exception will not be caught because it happens outside of your try...catch.

Beyond that:

reader.Read(); /// might return false, but no exception, so keep going...

if (!reader.Value.Equals("\r\n")) /// BOOM if the previous line returned false, which you ignored
{ 
    urlList.Add(reader.Value); 
} 
/// reader is now in unpredictable state

Edit

At the risk of writing a novel-length answer...

The error you're receiving

"' ', hexadecimal value 0x05, is an invalid character. Line 3887, position 13."

indicates that your source XML is malformed, and somehow wound up with a ^E (ASCII 0x05) at the specified position. I'd have a look at that line. If you're getting this file from a vendor or a service, you should have them fix their code. Correcting that, and any other malformed content within your XML, should correct issue that you're seeing.

Once that is fixed, your original code should work. However, using XmlTextReader for this isn't the most robust of solutions, and involves building some code that Visual Studio will happily generate for you:

In VS2012 (I don't have VS2010 installed any more, but it should be the same process):

  • Add a sample of the XML to your solution

  • In the properties for that file, set the CustomTool to "MSDataSetGenerator" (without the quotes)

  • The IDE should generate a .designer.cs file, containing a serializable class with a field for each item in the XML. (If not, right-click on the XML file in the solution explorer and select "Run Custom Tool".)

enter image description here

  • Use code like the following to load XML with the same schema as your sample at runtime:

    /// make sure the XML doesn't have errors, such as non-printable characters
    private static bool IsXmlMalformed(string fileName)
    {
        var reader = new XmlTextReader(fileName);
        var result = false;
    
        try
        {
            while (reader.Read()) ;
        }
        catch (Exception e)
        {
            result = true;
        }
    
        return result;
    }
    
    /// Process the XML using deserializer and VS-generated XML proxy classes
    private static void ParseVisitedLinksListXml(string fileName, List<string> urlList, List<int> invalidUrls)
    {
        if (IsXmlMalformed(fileName))
            throw new Exception("XML is not well-formed.");
    
        using (var textReader = new XmlTextReader(fileName))
        {
            var serializer = new XmlSerializer(typeof(visited_links_list));
    
            if (!serializer.CanDeserialize(textReader))
                throw new Exception("Can't deserialize this XML. Make sure the XML schema is up to date.");
    
            var list = (visited_links_list)serializer.Deserialize(textReader);
    
            foreach (var item in list.item)
            {
                if (!string.IsNullOrEmpty(item.url) && !item.url.Contains(Environment.NewLine))
                    urlList.Add(item.url);
                else
                    invalidUrls.Add(urlList.Count);
            }
        }
    }
    

You can also do this with the XSD.exe tool included with the Windows SDK.

1
On

Use continue

while (reader.Read())
        {
            switch (reader.Name)
            {
                case "url":
                    try
                    {
                        reader.Read();
                        if (!reader.Value.Equals("\r\n"))
                        {
                            urlList.Add(reader.Value);
                        }
                    }
                    catch
                    {
                        invalidUrls.Add(urlList.Count);
                        continue;
                    }
                    break;
            }
        }
1
On

I have a feeling reader is left in a faulty state after the exception is thrown (as reader.Read(); (inside the switch, not the while) most likely is the line the exception occurred on. Then the reader.Read() in the while doesn't return anything, and it exits.

I did a simple switch in a console app and catch and exception in it and the containing loop keeps on going.

var s = "abcdefg";
foreach (var character in s)
{
    switch (character)
    {
        case 'c':
            try
            {
                throw new Exception("c sucks");
            }
            catch
            {
                // Swallow the exception and move on?
            }
            break;
        default:
            Console.WriteLine(character);
            break;
    }
}

If you walk through the code, does it try to run reader.Read() in the while after the exception is caught?

2
On

I am assuming you are reading an valid xml Document such as myFile.xml. I am also assuming "url" is the element you are looking to obtain.

Load the document into a XMLDocument class and use that to traverse the nodes. This should eliminate the bad characters as it will convert those into the correct format such as & will turn into amp; ect..

the method below should work giving the example you provided.

        //get the text of the file into a string
        System.IO.StreamReader sr = new System.IO.StreamReader(@"C:\test.xml");
        String xmlText = sr.ReadToEnd();
        sr.Close();  
        //Create a List of strings and call the method
        List<String> urls = readXMLDoc(xmlText);
        //check to see if we have a list
        if (urls != null)
        {
            //do somthing
        }


    private List<String> readXMLDoc(String fileText)
    {
        //create a list of Strings to hold our Urls
        List<String> urlList = new List<String>();
        try
        {
            //create a XmlDocument Object
            XmlDocument xDoc = new XmlDocument();
            //load the text of the file into the XmlDocument Object
            xDoc.LoadXml(fileText);
            //Create a XmlNode object to hold the root node of the XmlDocument
            XmlNode rootNode = null;
            //get the root element in the xml document
            for (int i = 0; i < xDoc.ChildNodes.Count; i++)
            {
                //check to see if it is the root element
                if (xDoc.ChildNodes[i].Name == "visited_links_list")
                {
                    //assign the root node
                    rootNode = xDoc.ChildNodes[i];
                    break;
                }
            }

            //Loop through each of the child nodes of the root node
            for (int j = 0; j < rootNode.ChildNodes.Count; j++)
            {
                //check for the item tag
                if (rootNode.ChildNodes[j].Name == "item")
                {
                    //assign the item node
                    XmlNode itemNode = rootNode.ChildNodes[j];
                    //loop through each if the item tag's elements
                    foreach (XmlNode subNode in itemNode.ChildNodes)
                    {
                        //check for the url tag
                        if (subNode.Name == "url")
                        {
                            //add the url string to the list
                            urlList.Add(subNode.InnerText);
                        }
                    }
                }
            }
        }
        catch (Exception e)
        {
            System.Windows.Forms.MessageBox.Show(e.Message);
            return null;
        }
        //return the list
        return urlList;
    }