How to parse XML with cxml and stp containing ampersand

Question

How to parse XML with cxml and stp containing ampersand

567 Views Asked by Sim At 17 August 2025 at 07:31

I want to parse the following XML-Code:

(cxml:parse "<BEGIN><URL>www.some.de/url?some=data&bad=stuff</URL></BEGIN>" (stp:make-builder))

this results in

 #<CXML:WELL-FORMEDNESS-VIOLATION "~A" {1003C5E163}>

as '&' is a XML special character. But if I use &? instead the result is:

(cxml:parse "<BEGIN><URL>www.some.de/url?some=data&amp;bad=stuff</URL></BEGIN>" (stp:make-builder))
=>#.(CXML-STP-IMPL::DOCUMENT
   :CHILDREN '(#.(CXML-STP:ELEMENT
                  #| :PARENT of type DOCUMENT |#
                  :CHILDREN '(#.(CXML-STP:ELEMENT
                                 #| :PARENT of type ELEMENT |#
                                 :CHILDREN '(#.(CXML-STP:TEXT
                                                #| :PARENT of type ELEMENT |#
                                                :DATA "www.some.de/url?some=data")
                                             #.(CXML-STP:TEXT
                                                #| :PARENT of type ELEMENT |#
                                                :DATA "&")
                                             #.(CXML-STP:TEXT
                                                #| :PARENT of type ELEMENT |#
                                                :DATA "bad=stuff"))
                                 :LOCAL-NAME "URL"))
                  :LOCAL-NAME "BEGIN")))

Which is not exactly what I expected as there should only be one CXML-STP:TEXT child with DATA "www.some.de/url?some=data&bad=stuff"

How can I fix this wrong(?) behavior?

Original Q&A

There are 1 best solutions below

**Vsevolod Dyomkin** · Accepted Answer

This behavior, although, not very convenient, is, actually, present in many other XML parsers as well. Probably the reason for it is to be able to parse arbitrary XML entities and apply some user-defined rules to them. Although, it may be just a by-product of the parser implementation. I couldn't find out yet.

For the SAX variant of the parser I came to the following approach:

(defclass my-sax (sax:sax-parser-mixin)
  ((title :accessor title :initform nil)
   (tag :accessor tag :initform nil)
   (text :accessor text :initform "")))

(defmethod sax:start-element ((sax my-sax) namespace-uri local-name
                              qname attributes)
  (with-slots (tag tagcount text) sax
              (setf tag local-name
                    text "")))

(defmethod sax:characters ((sax my-sax) data)
  (with-slots (title tag text) sax
    (switch (tag :test 'string=)
      ("text"  (setf text (conatenate 'string text data)))
      ("title" (setf title data)))))

(defmethod sax:end-element ((sax my-sax) namespace-uri local-name qname)
  (with-slots (title tag text) sax
    (when (string= "text" local-name)
      ;; process (text sax)
    )))

I.e. I collect the text in sax:characters and process it in sax:end-element. In STP you, probably, can get away even easier by just concatenating neighboring text elements.

How to parse XML with cxml and stp containing ampersand

There are 1 best solutions below

Related Questions in COMMON-LISP

Related Questions in STP

Trending Questions

Popular # Hahtags

Popular Questions