I'm wondering if someone can help me trying to remove the XML declaration from a string containing an XML doc. Any help would be appreciated. We're using MSXML 4.0, but I was having difficulties using that and ended up just doing a substring. I'm not very familiar with the ATL and other Microsoft SDKs. It works, but a little part of me died inside and I would prefer to have this done in a less fragile manner.
Edit: Currently I am doing a sub-string on the first occurrence of a newline character. I was trying to tokenize or sub-string on the "?>" of the XML declaration, but I'm having issues on getting the character matching (using wcstok and substring). I tried "\?>", "\?>" and "?>". The ideal solution would be to load the document into XMLDocument object and just get the text of the message body.
Look up the XML specification, particularly the grammar for the
prolog
:So, your handspun code should be able to parse
VersionInfo
,EncodingDecl
andSDDecl
along with the XML declaration tag start and end tokens. For more info on these individual items see the specification.However, my suggestion would be to use the right tool for the right job: Use a XML toolkit/parser. (The difference between a parser and a toolkit is mainly that the toolkit will support advanced operations such as DTD validation, Namespace handling, XPath etc.).
MSXML4 is pretty old. MSXML6 is the latest. However, MSXML6 is pretty useless for anything but small XML files. So, choose a parser depending on your input file size (if performance is important). There are freely available libraries like Xerces, RapidXML, pugixml etc. which have much better performance.
Also, can you specify what difficulties you have faced with MSXML4?