I have a situation where I'd like to start using an XML Schema to validate documents that, until now, have never had a schema definition. As such, the existing documents I'd like to validate do not have any xmlns
declaration in them.
I have no problem successfully validating a document which does include the xmlns
declaration, but I'd also like to be able to validate those documents without such a declaration. I was hoping for something like this:
DocumentBuilderFactory dbf = ...;
dbf.setSchema(... my schema for namespace "foo:bar"...);
dbf.setValidating(false);
dbf.setNamespaceAware(true);
DocumentBuilder db = dbf.newDocumentBuilder();
db.setDefaultNamespace("foo:bar");
Document doc = db.parse(input);
There is no such method DocumentBuilder.setDefaultNamespace
and so the schema validation is not performed when loading documents of this type.
Is there any way to force the namespace for a document if one is not set? Or does that require essentially parsing the XML without regard to schema, checking for an existing namespace, adjusting it, then re-validating the document with the schema?
I'm currently expecting the parser to perform validation during parsing, but I have no problem parsing first and then validating afterward.
UPDATE 2021-01-13
Here is a concrete example of what I'm trying to do, as a JUnit test case.
import java.io.IOException;
import java.io.StringReader;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.transform.Source;
import javax.xml.transform.stream.StreamSource;
import javax.xml.validation.Schema;
import javax.xml.validation.SchemaFactory;
import org.junit.Assert;
import org.junit.Test;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.xml.sax.ErrorHandler;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;
public class XMLSchemaTest
{
private static final String XMLNS = "http://www.example.com/schema";
private static final String schemaDocument = "<xs:schema xmlns:xs=\"http://www.w3.org/2001/XMLSchema\" targetNamespace=\"" + XMLNS + "\" xmlns:e=\"" + XMLNS + "\" elementFormDefault=\"qualified\"><xs:element name=\"example\" type=\"e:exampleType\" /><xs:complexType name=\"exampleType\"><xs:sequence><xs:element name=\"test\" type=\"e:testType\" /></xs:sequence></xs:complexType><xs:complexType name=\"testType\" /></xs:schema>";
private static Document parse(String document) throws SAXException, ParserConfigurationException, IOException {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
SchemaFactory sf = SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");
Source[] sources = new Source[] {
new StreamSource(new StringReader(schemaDocument))
};
Schema schema = sf.newSchema(sources);
dbf.setSchema(schema);
dbf.setNamespaceAware(true);
DocumentBuilder db = dbf.newDocumentBuilder();
db.setErrorHandler(new MyErrorHandler());
return db.parse(new InputSource(new StringReader(document)));
}
@Test
public void testConformingDocumentWithSchema() throws Exception {
String testDocument = "<example xmlns=\"" + XMLNS + "\"><test/></example>";
Document doc = parse(testDocument);
//Assert.assertEquals("Wrong document XML namespace", XMLNS, doc.getNamespaceURI());
Element root = doc.getDocumentElement();
Assert.assertEquals("Wrong root element XML namespace", XMLNS, root.getNamespaceURI());
Assert.assertEquals("Wrong element name", "example", root.getLocalName());
Assert.assertEquals("Wrong element name", "example", root.getTagName());
}
@Test
public void testConformingDocumentWithoutSchema() throws Exception {
String testDocument = "<example><test/></example>";
Document doc = parse(testDocument);
//Assert.assertEquals("Wrong document XML namespace", XMLNS, doc.getNamespaceURI());
Element root = doc.getDocumentElement();
Assert.assertEquals("Wrong root element XML namespace", XMLNS, root.getNamespaceURI());
Assert.assertEquals("Wrong element name", "example", root.getLocalName());
Assert.assertEquals("Wrong element name", "example", root.getTagName());
}
@Test
public void testNononformingDocumentWithSchema() throws Exception {
String testDocument = "<example xmlns=\"" + XMLNS + "\"><random/></example>";
try {
parse(testDocument);
Assert.fail("Document should not have parsed properly");
} catch (Exception e) {
System.out.println(e);
// Expected
}
}
@Test
public void testNononformingDocumentWithoutSchema() throws Exception {
String testDocument = "<example><random/></example>";
try {
parse(testDocument);
Assert.fail("Document should not have parsed properly");
} catch (Exception e) {
System.out.println(e);
// Expected
}
}
public static class MyErrorHandler implements ErrorHandler {
@Override
public void warning(SAXParseException exception) throws SAXException {
System.err.println("WARNING: " + exception);
}
@Override
public void error(SAXParseException exception) throws SAXException {
throw exception;
}
@Override
public void fatalError(SAXParseException exception) throws SAXException {
System.err.println("FATAL: " + exception);
}
}
}
All of the tests pass except for testConformingDocumentWithoutSchema
. I think this is kind of expected, as the document declares no namespace.
I'm asking how the test can e changed (but not the document itself!) so that I can validate the document against a schema that was not actually declared by the document.
I pounded on this for a while, and I was able to come up with a hack that works. It may be possible to do this more elegantly (which was my original question), and it also may be possible to do this with less code, but this was what I was able to come up with.
If you look at the JUnit test case in the question, changing the "parse" method to the following (and adding
XMLNS
as the second argument to all calls toparse
) will allow all tests to complete:This works by catching
SAXParseException
and, becauseSAXParseException
doesn't give up any of its details, assuming that the problem might be due to a missing XML namespace declaration. I then re-parse the document without the schema validation, add a namespace declaration to the in-memoryDocument
, then serialize theDocument
toString
and re-parse the document with the schema validation re-enabled.I tried to do this just by setting the XML namespace and then using
Schema.newValidator().validate(new DOMSource(doc))
, but this failed validation every time for me. Running through the serializer got around that problem.