I am trying to download the html content of a web page and getting the 416 status. I found one solution which correctly improves the status code as 200 but still not downloading the proper content. I am very close but missing something. Please help.
Code with 416 status:
public static void main(String[] args) {
String URL="http://www.xyzzzzzzz.com.sg/";
HttpClient client = new org.apache.commons.httpclient.HttpClient();
org.apache.commons.httpclient.methods.GetMethod method = new org.apache.commons.httpclient.methods.GetMethod(URL);
client.getHttpConnectionManager().getParams().setConnectionTimeout(AppConfig.CONNECTION_TIMEOUT);
client.getHttpConnectionManager().getParams().setSoTimeout(AppConfig.READ_DATA_TIMEOUT);
String html = null; InputStream ios = null;
try {
int statusCode = client.executeMethod(method);
ios = method.getResponseBodyAsStream();
html = IOUtils.toString(ios, "utf-8");
System.out.println(statusCode);
}catch (Exception e) {
e.printStackTrace();
} finally {
if(ios!=null) {
try {ios.close();}
catch (IOException e) {e.printStackTrace();}
}
if(method!=null) method.releaseConnection();
}
System.out.println(html);
}
Code with 200 status (but htmlContent is not proper):
public static void main(String[] args) {
String URL="http://www.xyzzzzzzz.com.sg/";
HttpClient client = new org.apache.commons.httpclient.HttpClient();
org.apache.commons.httpclient.methods.GetMethod method = new org.apache.commons.httpclient.methods.GetMethod(URL);
client.getHttpConnectionManager().getParams().setConnectionTimeout(AppConfig.CONNECTION_TIMEOUT);
client.getHttpConnectionManager().getParams().setSoTimeout(AppConfig.READ_DATA_TIMEOUT);
String html = null; InputStream ios = null;
try {
int statusCode = client.executeMethod(method);
if(statusCode == HttpStatus.SC_REQUESTED_RANGE_NOT_SATISFIABLE) {
method.setRequestHeader("User-Agent", "Mozilla/5.0");
method.setRequestHeader("Accept-Ranges", "bytes=100-1500");
statusCode = client.executeMethod(method);
}
ios = method.getResponseBodyAsStream();
html = IOUtils.toString(ios, "utf-8");
System.out.println(statusCode);
}catch (Exception e) {
e.printStackTrace();
} finally {
if(ios!=null) {
try {ios.close();}
catch (IOException e) {e.printStackTrace();}
}
if(method!=null) method.releaseConnection();
}
System.out.println(html);
}
Your first sample code works for me without problems, the second sample code works if I remove the set headers code block
It's a bit strange, a LAN config issue maybe (firewall, proxy... etc), anyway HttpClient 3.1 is quite old, using httpclient 4.x from Apache HttpComponents
works as expected.
Try with HttpClient 4, if you still getting the same error then the problem is not in your code.