HtmlUnit scraping google+ page javascript. Click show more button not working

519 Views Asked by At

i am trying to scrap this page https://plus.google.com/115016587855962294424/about. Everything works fine but when i try to click show more to load more reviews nothing happens here is my code

final WebClient webClient = new WebClient(BrowserVersion.FIREFOX_24); 
page = webClient.getPage("https://plus.google.com/115016587855962294424/about"); 
assertEquals(200,page.getWebResponse().getStatusCode()); 
assertEquals("OK",page.getWebResponse().getStatusMessage()); 
System.out.println(page.getWebResponse().getStatusCode()); 

Clicking show more here

HtmlSpan advancedSearchAn = (HtmlSpan) page.getFirstByXPath("//*[@id=\"115016587855962294424-about-page\"]/div/div[1]/div/div/div[2]/div[3]/span[1]"); 
    page = advancedSearchAn.click(); 

but nothing happens i even tried

//            webClient.waitForBackgroundJavaScript(10 * 1000); 
//            webClient.setAjaxController(new NicelyResynchronizingAjaxController()); 
//            webClient.setAjaxController(new AjaxController(){ 
//                @Override 
//                public boolean processSynchron(HtmlPage page, WebRequest request, boolean async) 
//                { 
//                    return true; 
//                } 
//            }); 

Any suggestions ?

UPDATE:

*i was adviced to modify the incoming JavaScript code by subclass HttpWebConnection and override getResponse() as:*

new WebConnectionWrapper(webClient) { 
         public WebResponse getResponse(WebRequest request) throws IOException { 
      // System.out.println("content"); 
            WebResponse response = super.getResponse(request); 
        if          (request.getUrl().toExternalForm().contains("https://plus.google.com/115016587855962294424/about")) { 
           String content = response.getContentAsString("UTF-8"); 

        //change content -- what is need to be changed 

          System.out.println("content "+content); 
                   WebResponseData data = new WebResponseData(content.getBytes("UTF-8"), 
                           response.getStatusCode(), response.getStatusMessage(), response.getResponseHeaders()); 
                   response = new WebResponse(data, request, response.getLoadTime()); 
               } 
               System.out.println("content "+response.getContentAsString()); 
               return response; 
           } 

Any suggestions on how this can be done exactly and whats needed to be modified, i tried the following API's htmlunit jsoup webharvest selenium

1

There are 1 best solutions below

1
On

Clicking more leads to the submission of an ajax request, which on return changes the DOM

Htmlunit's javascript support is not good, so just analyze the request being sent using a proxy tool and code it manually.

I use Fiddler as a proxy tool.