XML:
<node>
Lorem ipsum
<child-node>dolor</child-node>
TEXT TO BE ACCESSED
</node>
<node>
sed do eiusmod tempor etc.
</node>
This is read into an rapidxml::xml_document<>
and parsed with the flag rapidxml::parse_validate_closing_tags
as follows: doc.parse<rapidxml::parse_validate_closing_tags>()
. (I would have thought that this flag solved the issue, but this does not appear to be the case.)
RapidXML C++ code looping through all <node>
s of doc
:
for (const rapidxml::xml_node<> *node = doc.first_node("node"); node != nullptr; node = node->next_sibling()) { std::cout << node->value(); }
node->value()
returns Lorem ipsum during the first loop.
While the text within the <child-node>
(dolor) is accessible by creating a new *node_2 = node->first_child()
(within the loop) and then accessing the value with node_2->value()
, the text that follows the <child node>
(TEXT TO BE ACCESSED) is not accessible in a similar way. The documentation does not offer much in terms of advice. How might this be done with RapidXML?
The XML is intended to encode an edition of a text (following e.g. Perseus Digital Library) and so the format used above is useful in order to mark specific words within sentences etc.
RapidXML parses XML into nodes of different types, in particular
node_element
andnode_data
nodes. For example, your<child-node>dolor</child-node>
is actually anode_element
node which contains anode_data
with the value "dolor".To make user code simpler, getting the
value()
of anode_element
returns the value of it's first data node - but if you have complex markup you can iterate over the data nodes to extract those values.Untested code below