Escape html except some special characters

1.3k Views Asked by At

With the goal to prevent html code injection and cross-site scripting, there is a filter built for service requests to escape some characters using: StringEscapeUtils.escapeHtml(text)

However, this is also escaping some UTF8 characters like äöü. Using an excludeList and converting these values to their hash code before calling the "StringEscapeUtils.escapeHtml" and converting back from hash values to strings after this call, solves the problem. But this is not a very elegant solution!

    String[] excludeList = {"ü", "Ü", "ö", "Ö", "ä", "Ä", "ß"};

    private static String escapeHtml(String text, String[] exclusionList) {
    TreeMap<Integer, String> excludeTempMap = new TreeMap<Integer, String>();

    //replace characters from exclusionList in the text with their equivalent hashCode
    for(String excludePart : exclusionList) {
        Matcher matcher = Pattern.compile(excludePart, Pattern.MULTILINE).matcher(text);

        while(matcher.find()) {
            String match = matcher.group();
            Integer matchHash = match.hashCode();

            text = matcher.replaceFirst(String.valueOf(matchHash));

            excludeTempMap.put(matchHash, match);

            matcher.reset(text);
        }
    }

    //escape malicious html characters
    text = StringEscapeUtils.escapeHtml(text);

    //replace back characters from exclusionList from hash values to string
    for(Map.Entry<Integer, String> excludeEntry : excludeTempMap.entrySet()) {
        text = text.replaceAll(
            String.valueOf(excludeEntry.getKey()),
            excludeEntry.getValue()
        );
    }

    return text;
}

Does someone have a tip how to achieve this with a better solution? Is their a better library which can be used to whitelist some language specific characters?

0

There are 0 best solutions below