I want to use htmlUnit to get a link from a webpage.
Here is my code:
String url = "https://farmaci.agenziafarmaco.gov.it/bancadatifarmaci/farmaco?farmaco=012745";
try {
java.util.logging.Logger.getLogger("com.gargoylesoftware.htmlunit").setLevel(Level.OFF);
final WebClient webClient = new WebClient();
WebClientOptions wco = webClient.getOptions();
wco.setUseInsecureSSL(true);
final HtmlPage page = webClient.getPage(url);
final HtmlElement list = page.getHtmlElementById("link_FI");
System.out.println(list.toString());
}catch(Exception e){
e.printStackTrace();
}
I want to obtain the link of "foglio illustrativo pdf". Navigate in html code(with function inspect code of chrome) it is inside tag 'a' with id "link_FI". But running the code above, the href tag is empty. The result is this:
HtmlAnchor[<a id="link_FI" href="#" title="Foglio Illustrativo">]
but href isn't empty. Why??
The website is loading some content from the server later on and modifies the link you are querying. If your web client is not executing all the javascript, the hrefs may very well be empty.
Disable javascript in the browser and load the page. The anchor tag you are looking at looks like this:
Solving this issue is not easy, I would suggest you use a full blown browser with Javascript support and grab the page using that. It seems that
javafx.scene.web.WebView
should be doing what you want, it should contain proper JavaScript support and wraps Webkit - but I have never used it.Same applies for HtmlUnit, it says, it supports the Javscript needs you should be looking for, but I cannot provide you with an example. Sorry.