Java StringEscapeUtils.escapeHtml4 as regular text

644 Views Asked by At

My target is to display special letters of message as regular text after using StringEscapeUtils.escapeHtml4. Text example:

<html>
<body>
<p>éô</p>
</body>
</html>

My expected result is to make all the HTML tags being escaped, but not the text, that is here: éô

Code example:

String original = "<html><head><\\head><>éô";
System.out.println("original: " + original);

String translated = StringEscapeUtils.escapeHtml4(original);
System.out.println("translated: " + translated);

Output:

original: <html><head><\head><body>éô
translated: &lt;html&gt;&lt;head&gt;&lt;\head&gt;&lt;body&gt;&eacute;&ocirc;

I am expect to get: &lt;html&gt;&lt;head&gt;&lt;\head&gt;&lt;body&gt;éô

1

There are 1 best solutions below

0
On BEST ANSWER

I think that I found the solution that mentioned here: Escape HTML in Languages with Accented Letters

by creating a custom escaping method that will use only two lookup translators:

public static final CharSequenceTranslator ESCAPE_HTML4_CUSTOM =
        new AggregateTranslator(
                new LookupTranslator(EntityArrays.BASIC_ESCAPE()),
                new LookupTranslator(EntityArrays.HTML40_EXTENDED_ESCAPE())
        );

In the original method StringEscapeUtils.escapeHtml4 there are:

    public static final CharSequenceTranslator ESCAPE_HTML4 = 
    new AggregateTranslator(
        new LookupTranslator(EntityArrays.BASIC_ESCAPE()),
        new LookupTranslator(EntityArrays.ISO8859_1_ESCAPE()),
        new LookupTranslator(EntityArrays.HTML40_EXTENDED_ESCAPE())
    );