How to use Google Caja HTML/CSS sanitizer JS library in Java

2.4k Views Asked by At

I have a Java API that accepts a custom CSS field. I need to sanitize the CSS before storing it in my database and would like to use Google Caja for this.

First, I tried running the Google Caja HTML/CSS sanitizer JavaScript library using the Rhino JavaScript engine. Unfortunately, that didn't work because that library depends heavily on the existence of a DOM (specifically, the window object).

Next, I imported the Caja project from the Maven repository. I looked through some of the tests, but could not find an example of how to use the sanitizer.

I could try bringing the browser to the server, but that seems a bit excessive.

Has anyone been able to use Caja to sanitize a CSS string in Java?

Thanks in advance!

2

There are 2 best solutions below

2
Pau Carre On

Google Caja is also a "Java project" and therefore you can execute anything Caja can do directly in java. For example, you can take a look to a Caja unit test case to validate the CSS directly in java here.

1
seanf On

If you plan to sanitise on a Java server, I would recommend using OWASP HTML Sanitizer, which is apparently based on code from Caja. It includes the ability to sanitise <a> elements to include rel="nofollow".

import org.owasp.html.PolicyFactory;
import static org.owasp.html.Sanitizers.BLOCKS;
import static org.owasp.html.Sanitizers.FORMATTING;
import static org.owasp.html.Sanitizers.IMAGES;
import static org.owasp.html.Sanitizers.LINKS;

PolicyFactory sanitiser = BLOCKS.and(FORMATTING).and(IMAGES).and(LINKS);
String htmlSanitised = sanitiser.sanitize(htmlSource)

Nevertheless, to invoke Caja from Java, this works both with Rhino (Java 7) and Nashorn (Java 8):

import javax.script.Bindings;
import javax.script.ScriptContext;
import javax.script.ScriptEngine;
import javax.script.ScriptEngineManager;
import javax.script.ScriptException;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;

public class CajaSanitiser {

    private final ScriptEngine engine;
    private final Bindings bindings;

    public CajaSanitiser() throws IOException, ScriptException {
        this.engine = new ScriptEngineManager().getEngineByName("js");
        this.bindings = engine.getBindings(ScriptContext.ENGINE_SCOPE);
        String scriptName = "com/google/caja/plugin/html-css-sanitizer-minified.js";
        try (BufferedReader reader = getReader(scriptName)) {
            engine.eval(reader);
        }
        String identity = "function identity(value) {return value;}";
        engine.eval(identity);
    }

    private BufferedReader getReader(String name) {
        return new BufferedReader(new InputStreamReader(
                getClass().getClassLoader().getResourceAsStream(name)));
    }

    public String sanitise(String htmlSource) throws ScriptException {
        bindings.put("src", htmlSource);
        // You can use other functions beside 'identity' if you
        // want to transform the html.
        // See https://code.google.com/p/google-caja/wiki/JsHtmlSanitizer
        return (String) engine.eval("html_sanitize(src, identity, identity)");
    }

    public static void main(String[] args) throws Exception {
        CajaSanitiser sanitiser = new CajaSanitiser();
        String source = "<html>\n" +
                "<head>\n" +
                "<style>\n" +
                "h1 {color:blue;}\n" +
                "</style>\n" +
                "</head>\n" +
                "<body>\n" +
                "<h1>A heading</h1>\n" +
                "</body>\n" +
                "</html>";
        System.out.println("Original HTML with CSS:");
        System.out.println(source);
        System.out.println();
        System.out.println("Sanitised HTML:");
        System.out.println(sanitiser.sanitise(source));
    }
}

I used this as part of my Maven configuration:

<dependencies>
    <dependency>
        <groupId>caja</groupId>
        <artifactId>caja</artifactId>
        <version>r5127</version>
    </dependency>
</dependencies>
<repositories>
    <repository>
        <id>caja</id>
        <name>caja</name>
        <url>http://google-caja.googlecode.com/svn/maven</url>
    </repository>
</repositories>