How to filter responses types in HtmlUnit?

172 Views Asked by At

While crawling a webpage I am getting various response types (image/text/html/json/css/js etc). I only need the .json files not the other ones. How can I filter other response types using HtmlUnit?

Problem is: The required data is stored in a specific .json file and that .json file doesn't have a unique url. So I am planning to filter other response type and download the content of all the json files. Later on I will clean the data.

Please help. Just an idea will be enough.

1

There are 1 best solutions below

0
On

You can see modify the request and responses, as hinted here.

Check if the URL contains .json string, and then save it.

   new WebConnectionWrapper(webClient) {

        public WebResponse getResponse(WebRequest request) throws IOException {
            WebResponse response = super.getResponse(request);
            if (request.getUrl().toExternalForm().contains(".json")) {
                String content = response.getContentAsString("UTF-8");

                //save content
            }
            return response;
        }
    };