Avoiding escaping character entity references when creating a text node in XML

532 Views Asked by At

I am using XML DOM techniques to build a pulldown menu in JavaScript.

After I create an <option> node, I append the text that is supposed to appear for that option. The problem I am facing is that when the text contains character entity references (CERs) such as &#8322; the & character of the CER is being escaped to &amp;, so that the CER and not the character is displayed in the select menu when the menu is outputted to the page for display.

I have tried both of the following methods: optionNode.appendChild(xmlDoc.createTextNode(label)); and

optionNode.textContent = label;

And both give the same result. I can work around the problem by doing a global replace of &amp; with & after I output the XML document to text:

var xml = (new XMLSerializer()).serializeToString(xmlDoc);
return xml.replace(/&amp;/g, '&');

But I'm sure there must be a way to avoid the escaping in the first place. Is there?

2

There are 2 best solutions below

4
Mads Hansen On

You could use createCDATASection() instead of createTextNode()

var docu = new DOMParser().parseFromString('<xml></xml>',  "application/xml")
var cdata = docu.createCDATASection('Some <CDATA> data & then some');
docu.getElementsByTagName('xml')[0].appendChild(cdata);

alert(new XMLSerializer().serializeToString(docu));
// Displays: <xml><![CDATA[Some <CDATA> data & then some]]></xml>
0
Robert Grossman On

I found a solution. Before I create a node containing label, I convert all the character entity references in label to Unicode characters. Then, when I output the xml as a String, I convert all the Unicode characters back to character entity references. Code is adapted from code that I found elsewhere on Stack Overflow.

function cerToUnicode(str) {
    "use strict";
    var entity_table = {
       '&quot;': String.fromCharCode(34), // Quotation mark. Not required
       '&amp;': String.fromCharCode(38), // Ampersand
               '&lt;': String.fromCharCode(60), // Less-than sign
       '&gt;': String.fromCharCode(62), // Greater-than sign
       '&nbsp;': String.fromCharCode(160), // Non-breaking space
       '&iexcl;': String.fromCharCode(161), // Inverted exclamation mark
       ... // other named CERs
   };
   str = str.replace(/&#(\d+);/g,
       function (matched, capture1) {
           return (capture1 == '38' ? '&amp;' : String.fromCharCode(capture1));
       });
   str = str.replace(/&[^;]*;/g,
       function (matched) {
           return entity_table[matched];
       });
   return str;
} // cerToUnicode()

function unicodeToCER(str) {
    return str.replace(/./gm, function(s) {
        var code = s.charCodeAt(0);
        return (code < 128 ? s : "&#" + code + ";");
    });
} // unicodeToCER()