I want to parse the following XML-Code:
(cxml:parse "<BEGIN><URL>www.some.de/url?some=data&bad=stuff</URL></BEGIN>" (stp:make-builder))
this results in
#<CXML:WELL-FORMEDNESS-VIOLATION "~A" {1003C5E163}>
as '&' is a XML special character. But if I use &?
instead the result is:
(cxml:parse "<BEGIN><URL>www.some.de/url?some=data&bad=stuff</URL></BEGIN>" (stp:make-builder))
=>#.(CXML-STP-IMPL::DOCUMENT
:CHILDREN '(#.(CXML-STP:ELEMENT
#| :PARENT of type DOCUMENT |#
:CHILDREN '(#.(CXML-STP:ELEMENT
#| :PARENT of type ELEMENT |#
:CHILDREN '(#.(CXML-STP:TEXT
#| :PARENT of type ELEMENT |#
:DATA "www.some.de/url?some=data")
#.(CXML-STP:TEXT
#| :PARENT of type ELEMENT |#
:DATA "&")
#.(CXML-STP:TEXT
#| :PARENT of type ELEMENT |#
:DATA "bad=stuff"))
:LOCAL-NAME "URL"))
:LOCAL-NAME "BEGIN")))
Which is not exactly what I expected as there should only be one CXML-STP:TEXT child with DATA "www.some.de/url?some=data&bad=stuff"
How can I fix this wrong(?) behavior?
This behavior, although, not very convenient, is, actually, present in many other XML parsers as well. Probably the reason for it is to be able to parse arbitrary XML entities and apply some user-defined rules to them. Although, it may be just a by-product of the parser implementation. I couldn't find out yet.
For the SAX variant of the parser I came to the following approach:
I.e. I collect the text in
sax:characters
and process it insax:end-element
. In STP you, probably, can get away even easier by just concatenating neighboringtext
elements.