Parse html document with NekoHTML

4.7k Views Asked by tt0686 At 21 June 2025 at 05:40

I am using NekoHTML framework with xerces 2.11.0 version to parse an HTML document. But i am having a problem with this simple code :

DOMParser parser = new DOMParser();
System.out.println(parser.getClass().toString());
InputSource url = new InputSource("http://www.cbgarden.org");
try{
    parser.parse(url);
    Document document = parser.getDocument();
    System.out.println(document.hasChildNodes());
    System.out.println(document.getBaseURI());
    System.out.println(document.getNodeName());
    System.out.println(document.getNodeValue());
}catch(Exception e){
    e.printStackTrace();
}

Now I put here the result of the multiple prints:

class org.cyberneko.html.parsers.DOMParser
true
http://www.cbgarden.org
document
null

So my question is : What could be wrong ? No exception is thrown and I am following the rules that are defined in the usage rules in the NekoHTML. My build path libraries are with this precedence:

nekohtml.jar
nekohtmlSamples.jar
xercesImpl.jar
xercesSamples.jar
xml-apis.jar

Original Q&A

There are 1 best solutions below

Martijn Courteaux On 11 October 2011 at 16:30

I guess your question is about the null?
The document node has no value. It only has subnodes (like <html> witch contains <head> and <body>).

But if you want to have the whole page source as a String, you can simply download it using a URL its method openStream().

Parse html document with NekoHTML

There are 1 best solutions below

Related Questions in JAVA

Related Questions in HTML

Related Questions in PARSING

Related Questions in CYBERNEKO

Trending Questions

Popular # Hahtags

Popular Questions