Unable to parse JavaHelp's toc.xml

616 Views Asked by At

I wrote an XML parser for JavaHelp's toc.xml file in order to intercept some attributes I use in the tocitem tag that are ignored by JavaHelp. Here's what toc.xml looks like:

<?xml version='1.0' encoding='ISO-8859-1'  ?>

<!DOCTYPE toc
   PUBLIC "-//Sun Microsystems Inc.//DTD JavaHelp TOC Version 2.0//EN"
     "http://java.sun.com/products/javahelp/toc_2_0.dtd">
<toc version="2.0">
   <tocitem text="Introduction" target="intro" action="myapp.help.introAction"/>
</toc>

I am parsing toc.xml using the standard SAX parser. When I parse the file, I get the following exception:

myapp.help.TOCTreeFactory[WARN]: Failed to load TOC file from 'jar:file:/home/samad/myapp.jar!/workflow-help/toc.xml'

Caused by:
http://java.sun.com/javase/technologies/desktop/javahelp/toc_2_0.dtd
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startEntity(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startDTDEntity(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDTDScannerImpl.setInputSource(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.dispatch(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source)
at myapp.help.TOCTreeFactory.createTree(TOCTreeFactory.java:43)
...

I opened up the URL http://java.sun.com/javase/technologies/desktop/javahelp/toc_2_0.dtd, and I get a 404.

How can I resolve this problem? I tried downloading the JavaHelp distribution, but it doesn't contain the toc_2_0.dtd file that is needed by SAX.

2

There are 2 best solutions below

1
On BEST ANSWER

Have you tried turning off validation in the factory?

SAXParserFactory pf = SAXParserFactory.newInstance();
pf.setValidating(false);

One other alternative...Stop your Java SAX parser from downloading DTDs

0
On

When searching for solutions to a similar issue, I was first directed to this question. The setValidating() method didn't work for me either. This answer to a related question pointed me toward SAXParserFactory's setFeature() method, which did work.

SAXParserFactory spf = SAXParserFactory.newInstance();
spf.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);