How to set system and public ID without validating or checking DTD?

1.2k Views Asked by At

Not sure if it's just me or the API but I am simply not able to create an XML file without having either an exception thrown at me or the thing I'm trying to set (DocType) not being set.

This is what I'm currently doing:

StringBuilder stringBuilder = new StringBuilder();
stringBuilder.append("<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>");
stringBuilder.append("<!DOCTYPE document>");

String xmlString = AnnotatedDocumentTree.toString(annotatedDocumentTree, new SimpleAnnotatedDocumentTreeXmlConverter(), stringBuilder);

DocumentBuilderFactory icFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder icBuilder;          
Document finalDocument = null;                 

StringWriter writer = new StringWriter();

try {

    icBuilder = icFactory.newDocumentBuilder(); 

    finalDocument = icBuilder.parse(new InputSource(new ByteArrayInputStream(xmlString.getBytes("UTF-8"))));                

    Transformer transformer = TransformerFactory.newInstance().newTransformer();

    DocumentType doctype = xmlDocument.getDoctype();                    

    transformer.setOutputProperty(OutputKeys.DOCTYPE_SYSTEM, doctype.getSystemId());
    transformer.setOutputProperty(OutputKeys.DOCTYPE_PUBLIC, doctype.getPublicId());
    transformer.transform(new DOMSource(finalDocument), new StreamResult(writer));

    finalDocument = icBuilder.parse(new InputSource(new ByteArrayInputStream(writer.toString().getBytes("UTF-8"))));


} catch (Exception e) {
    e.printStackTrace();
}

However, this way I'm getting an exception. I can use the DocumentBuilderFactory and configure it like this:

icFactory.setValidating(false);
icFactory.setNamespaceAware(true);
icFactory.setFeature("http://xml.org/sax/features/namespaces", false);
icFactory.setFeature("http://xml.org/sax/features/validation", false);
icFactory.setFeature("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", false);
icFactory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);

but then DocType of my finalDocument will be null.

Setting my own EntityResolver won't do the trick either:

builder.setEntityResolver(new EntityResolver() {
    @Override
    public InputSource resolveEntity(String publicId, String systemId)
            throws SAXException, IOException {
        if (systemId.contains(".dtd")) {
            return new InputSource(new StringReader(""));
        } else {
            return null;
        }
    }
});

because if I want to set doctype.getSystemId() I really want to set doctype.getSystemId().

Is there a way to shove set it without this headache?


Essentially I want to parse this:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!DOCTYPE document>
<ds>
    ABGB <cue>: §§ 786 , 810 , 812 </cue>Die Kosten der ... 
    <cue>von</cue>
    <Relation bewertung="1">7 Ob 56/10a </Relation>= 
    <Relation bewertung="1">Zak 2010/773 , 440 </Relation>. 
</ds>

and transform it into this:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE ds PUBLIC "-//MBO//DTD artikel-at 1.0//DE" "http://dtd.company.de/dtd-at/artikel.dtd">
<ds>
    ABGB <cue>: §§ 786 , 810 , 812 
    </cue>Die Kosten der ... <cue>
    von 
    </cue><Relation bewertung="1">7 Ob 56/10a </Relation>= 
    <Relation bewertung="1">Zak 2010/773 , 440 </Relation>. 
</ds>
2

There are 2 best solutions below

8
On BEST ANSWER

To me your code works if the dtd exists at the specified location (systemId), otherwise adding the entity resolver as in the code down makes the trick.

I don't have xmlDocument so I hardcoded the values

    StringBuilder stringBuilder = new StringBuilder();
    stringBuilder.append("<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>");
    stringBuilder.append("<!DOCTYPE document><document/>");

    String xmlString = stringBuilder.toString(); // AnnotatedDocumentTree.toString(annotatedDocumentTree, new SimpleAnnotatedDocumentTreeXmlConverter(), stringBuilder);

    DocumentBuilderFactory icFactory = DocumentBuilderFactory.newInstance();
    DocumentBuilder icBuilder;          
    Document finalDocument = null;                 

    StringWriter writer = new StringWriter();

    try {

        icBuilder = icFactory.newDocumentBuilder(); 

        finalDocument = icBuilder.parse(new InputSource(new ByteArrayInputStream(xmlString.getBytes("UTF-8"))));                

        Transformer transformer = TransformerFactory.newInstance().newTransformer();

        //DocumentType doctype = xmlDocument.getDoctype();                    

        transformer.setOutputProperty(OutputKeys.DOCTYPE_SYSTEM, "xdtd.dtd"); // doctype.getSystemId());
        transformer.setOutputProperty(OutputKeys.DOCTYPE_PUBLIC, "xxxx"); //doctype.getPublicId());
        transformer.transform(new DOMSource(finalDocument), new StreamResult(writer));

        icBuilder.setEntityResolver(new EntityResolver() {
            @Override
            public InputSource resolveEntity(String publicId, String systemId)
                    throws SAXException, IOException {
                if (systemId.contains(".dtd")) {
                    return new InputSource(new StringReader(""));
                } else {
                    return null;
                }
            }
        });
        finalDocument = icBuilder.parse(new InputSource(new ByteArrayInputStream(writer.toString().getBytes("UTF-8"))));

        System.out.println(finalDocument.getDoctype().getPublicId());
        System.out.println("-----------");
        System.out.println(writer.toString());

    } catch (Exception e) {
        e.printStackTrace();
    }

Output:

      xxxx
     -----------


     <?xml version="1.0" encoding="UTF-8"?>
     <!DOCTYPE document PUBLIC "xxxx" "xdtd.dtd">
     <document/>

Also the option of setting the properties works, without entity resolver, must be done before creating the builder. Of the properties, only http://apache.org/xml/features/nonvalidating/load-external-dtd is needed.


Here is the fun thing though: It's getting set on-read as it appears:

Before accessing docType:

enter image description here

After accessing docType:

enter image description here


This can be controlled, in Xerces, using property http://apache.org/xml/features/dom/defer-node-expansion, by default true

0
On

Try this:

Transformer t = TransformerFactory.newInstance().newTransformer();
Source s = new StreamSource(new StringReader(inputXML));
StringWriter sw = new StringWriter();
t.setOutputProperty(OutputKeys.DOCTYPE_SYSTEM, "my.system.id");
t.setOutputProperty(OutputKeys.DOCTYPE_PUBLIC, "my/public/id");
t.transform(s, new StreamResult(sw));

No need for this to go via DOM at all.