I need to do some parsing and information retrieval from XML documents. The XML document is bound to an XML data binding, then parsed for specific elements. Once I have isolated the elements I need to dissect, I take each one in turn (lets call it E_parent) and try to identify the location of each non-text child element (E_child) within the overall XML text of E_parent and do some manipulation or other.
The problem I'm having, is that the XML document's namespace is added to the child elements' XML when they are accessed individually.
To give an example, say the original document looks like:
<?xml version="1.0" encoding="windows-1252"?>
<RootNode xml:lang="en" xmlns="urn:blah:names:blahblah">
<E_parent>Some text <E_child>child text</E_child> more parent text</E_parent>
</RootNode>
</xml>
When I try to access the XML from either the E_parent or E_child element by doing something like:
xmlParent := parentNode.XML;
I get:
<E_parent xmlns="urn:blah:names:blahblah">Some text <E_child>child text</E_child> more parent text</E_parent>
same thing if I try to access the XML for E_child, I get:
<E_child xmlns="urn:blah:names:blahblah">child text</E_child>
That's a problem when I then try to do a text search on the parent element, since the "real" text does not contain that namespace declaration:
Some text <E_child>child text</E_child> more parent text
So far, I've dealt with this by finding/deleting unwanted namespace attributes in the strings, but it's highly inefficient, and kind of ugly ;o) So, my question is, how can I retrieve the various nodes' XML from a bound XML document, without the document namespace being added to the tags?
=========
Thanks Remy, it was so obvious, I just need to start from a blank string and build it up rather than start from the inner XML!
Note though, that this is a better workaround than the one I had for this specific situation, but not quite what I wanted - obtaining the XML of elements without the namespace would still be useful for other things, such as logging, where I would want the exact XML of the node as it appears in the original document.
Use the DOM for processing E_parent's contents. Rather then retreiving the
XML
of E_parent and then searching for an E_child tag inside of it, use the DOM to determine what plain text exists in front of the E_child node (the plain text will have its own child node), and the length of that plain-text will tell you the exact text position of E_Child without needing to retreive E_parent'sXML
at all. E-parent will have multiple plain-text child nodes in the relevant positions for each section of untagged text.In other words, given the XML you showed, the structure of the DOM will look something like this: