How to bypass parameter-entities resolution in DTD parsing

392 Views Asked by At

I maintain a Java program to help with Mozilla localization. Mozilla localization uses, among other formats, DTD files (in runtime, the DTDs for the selected locale are injected into XML files defining the UI, resulting so in localized interfaces).

The DTDs include sometimes other files in the form of PE references, like this:

<!ENTITY % brandDTD SYSTEM "chrome://branding/locale/brand.dtd">
%brandDTD;

However, for my program purposes, when I'm parsing the en-US version of a DTD file with such PE reference, I just need to copycat those lines into the target localized file.

I inherited the program, which does some clever (to me) trick to parse the DTD using SAX (Xerces bundled in JDK):

public class DTDReadHelper extends DefaultHandler2 {
    private static final String dummyXml="<?xml version=\"1.0\" encoding=\"UTF-8\"?>" +
            "<!DOCTYPE dialog SYSTEM \"MozillaTranslator\">" +
            "<dialog></dialog>";
(...)
    @Override
    public InputSource resolveEntity(java.lang.String name, java.lang.String publicId,
            java.lang.String baseURI, java.lang.String systemId) {

        if ((name != null) && (name.startsWith("%"))) {
           return new InputSource(new StringReader(""));
        } else {
            return new InputSource(is);
        }
    }

The problem is that the else branch is reached twice during a typical parsing, thus resulting in a IOException from Xerces if the files is smaller than 8192 bytes, or a SAXParseException otherwise.

I can deal with the IOException, because the entire file has been already parsed when it happens, but I'm unable to do it with the SAXParseException because it happens at the very beginning of the parsing (lineNumber 1, columnNumber 2).

The "offending" DTD file can be found here:

http://hg.mozilla.org/mozilla-central/file/13fe5ad0364d/browser/locales/en-US/chrome/overrides/netError.dtd

and the entire source code of the program lives here:

https://kenai.com/projects/moztrans/sources

My "SCCE" contains four files (simplified versions of those in the above repository), I could upload them somewhere if needed.

The question, after all these details, is: is there any way to get SAX to avoid trying to resolve the PE references?

TIA

1

There are 1 best solutions below

0
On

I finally found the solution (if you look at the repository, you will notice I forgot to update this question, as I committed the fix some months ago). :-) The solution lies in these code:

// private static final String brandDummyDtd = "<!ENTITY brandDTD \"\">";
private static final String brandDummyDtd = "";
(...)
@Override
public InputSource resolveEntity(java.lang.String name, java.lang.String publicId,
        java.lang.String baseURI, java.lang.String systemId) {

    if (name == null) {
        switch (systemId) {
            // Trick to get a SAX XML Parser to parse a DTD
            case "MozillaTranslator":
                return new InputSource(is);

            // Trick to resolve references to brand.dtd without
            // actually having to resolve, load and parse
            // the chrome: URI
            case "chrome://branding/locale/brand.dtd":
            default:
                return new InputSource(new ByteArrayInputStream(brandDummyDtd.getBytes()));
        }
    } else {
        return new InputSource(new StringReader(""));
    }
}

The solution is to return an empty InputSource (brandDummyDtd property, defined as private static final at the top). I've included the commented version with the entity that triggers the resolveEntity method to explain that using such solution does not work. If I remember correctly, doing so would fail if the DTD parsed contains an actual brandDTD entity.