How to escape strings with numeric character references in Java

1.5k Views Asked by At

Hello and thank you for reading my post.

The Apache Commons StringEscapeUtils.escapeHtml3() and StringEscapeUtils.escapeHtml4() functions allow, in particular, to convert characters with an acute (like é, à...) in a string into character entity references which have the format &name; where name is a case-sensitive alphanumeric string.

How can I get the escaped string of a given string with numeric character references instead (&#nnnn; or &#xhhhh; where nnnn is the code point in decimal form, and hhhh is the code point in hexadecimal form)?

I actually need to escape strings for a XML document which doesn't know about such entities as & eacute;, & agrave; etc.

Best regards.

2

There are 2 best solutions below

0
On

To solve this problem, I wrote a method which takes a string as an argument and replaces, in this string, character entity references (like é) with their corresponding numeric character references (é in this case).

I used this W3C list of references: http://www.sagehill.net/livedtd/xhtml1-transitional/xhtml-lat1.ent.html

Nota: It would be great to be able to pass another argument to the StringEscapeUtils.escapeHtml4() method to tell it whether we would like character entity references or numeric character references in the output string...

1
On

Create your CharacterTranslator:

CharacterTranslator XML_ESCAPE = StringEscapeUtils.ESCAPE_XML11.with(
    NumericEntityEscaper.between(0x7f, Integer.MAX_VALUE) );

and use it:

XML_ESCAPE.translate(…)