How to fix a cyberneko self closing iframe is not recognizeable in htmlunit?

153 Views Asked by At

I am currently trying to make a web scraping program by using HTMLunit. However, when i ran it i receive this error

Exception in thread "main" com.gargoylesoftware.htmlunit.ObjectInstantiationException: unable to create HTML parser
    at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.<init>(HTMLParser.java:418)
    at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.<init>(HTMLParser.java:342)
    at com.gargoylesoftware.htmlunit.html.HTMLParser.parse(HTMLParser.java:203)
    at com.gargoylesoftware.htmlunit.html.HTMLParser.parseHtml(HTMLParser.java:179)
    at com.gargoylesoftware.htmlunit.DefaultPageCreator.createHtmlPage(DefaultPageCreator.java:221)
    at com.gargoylesoftware.htmlunit.DefaultPageCreator.createPage(DefaultPageCreator.java:106)
    at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseInto(WebClient.java:433)
    at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:311)
    at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:373)
    at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:358)
    at ReviewScrapping.getCOntentData(ReviewScrapping.java:28)
    at ReviewScrapping.main(ReviewScrapping.java:34)
Caused by: org.xml.sax.SAXNotRecognizedException: Feature 'http://cyberneko.org/html/features/scanner/allow-selfclosing-iframe' is not recognized.
    at org.apache.xerces.parsers.AbstractSAXParser.setFeature(Unknown Source)
    at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.<init>(HTMLParser.java:411)
    ... 11 more

I already tried to follow this solution When using HtmlUnit, how can I configure the underlying NekoHtml parser?

However, I still receive the same problem.

This is my current programs where i connect the website to my program

 public static HtmlPage getCOntentData(String url) throws IOException{
        BrowserVersionFeatures[] bvf = new BrowserVersionFeatures[1];
        bvf[0] = BrowserVersionFeatures.HTMLIFRAME_IGNORE_SELFCLOSING;
        BrowserVersion bv = new BrowserVersion(
                BrowserVersion.NETSCAPE, "5.0 (Windows; en-US)",
                "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8",
                (float) 3.6, bvf);

        WebClient webClient = new WebClient(bv);
        webClient.setJavaScriptEnabled(true);

        return webClient.getPage(url);
    }

in my main

 HtmlPage site = getCOntentData("https://www.tokopedia.com/p/handphone-tablet");
            List<?> date = site.getByXPath("//div[@class='V4CqgZIv']");
            System.out.println(date.get(0));

This is what I have right now and i am currently stuck on how to fix it.

What I want right now is to have that error gone

1

There are 1 best solutions below

0
On
Caused by: org.xml.sax.SAXNotRecognizedException: Feature 'http://cyberneko.org/html/features/scanner/allow-selfclosing-iframe' is not recognized.

It looks like you have the wrong version of the neko parser. Please use the latest Version (2.35.0 at the moment). If you use maven please make sure that no other part of the application overrules the neko-htmlunit dependency (also at version 2.35.0). If you do not use maven download the file htmlunit-2.35.0-bin.zip and make sure you have only the correct version of all the dependencies in your classpath.