RapidXML weird parsing

1.7k Views Asked by At

I have a very annoying problem and I'm trying to solve it for lots of hours. I'm using rapidXML with C++ to parse an XML file:

xml_document<> xmlin;
stringstream input; //initialized somewhere else
xmlin.clear();
xmlin.parse<0>(&(input.str()[0]));

cout << "input:" << input.str() << endl << endl;

xml_node<char> *firstnode = xmlin.first_node();
string s_type = firstnode->first_attribute("type")->value();
cout << "type: " << s_type << endl;

However I got this on the stdout:

input:<?xml version="1.0" encoding="utf-8"?><testxml command="testfunction" type="exclusive" />

type: exclusive" /> 

What could be the reason of this (printing the s_type variable)? It's very annoying since I can't process the xml well.

3

There are 3 best solutions below

0
On

I think the problem is in the code you haven't shown... Start by trying this, using a literal string - this works just fine for me...

xml_document<> xmlin;
char *input = "<?xml version=\"1.0\" encoding=\"utf-8\"?><testxml command=\"testfunction\" type=\"exclusive\" />";
xmlin.parse<0>(input);

xml_node<char> *firstnode = xmlin.first_node();
std::string s_type = firstnode->first_attribute("type")->value();
4
On

Actually I found the solution.

Stringstream doesn't like when its content is getting modified (rapidXML does a fast in-situ parsing which means it modificates the contents of the array it gets).

However in the docs I read that string class does not like it either.

From the string::c_str documentation page:

the values in this array should not be modified in the program

But when I create a string from the stream it is working as it is expected:

xml_document<> xmlin;
stringstream input; //initialized somewhere else
string buffer = input.str()

xmlin.clear();
xmlin.parse<0>(&(buffer[0]));
0
On

I would personally recommend this approach

 xml_document<> doc;
 string string_to_parse;                         
 char* buffer = new char[str_to_parse.size() + 1];  
 strcpy (buffer, str_to_parse.c_str());             

 doc.parse<0>(buffer);                    

 delete [] cstr;  

making a non const char array out of the string you want to parse. I have always found this way safer and more reliable.

I used to do such crazy things as

 string string_to_parse;  
 doc.parse<0>(const_cast<char*>(string_to_parse.c_str()));

and it "worked" for a long time (until the day it didn't when I needed to reuse the original string). Since RapidXML can modify the char array it is parsing and since it is not recommended to change str::string via c_str() I have always used the approach of copying my string to a non const char array and pass that to the parser. It may not be optimal and uses additional memory, but it is reliable and I have never had any errors or problems with it to date. Your data will be parsed and the original string can be reused without fear of it having been modified.