I use HttpURLConnection to crawler https://translate.google.com/.
InetSocketAddress addr = new InetSocketAddress("127.0.0.1", 1082);
Proxy proxy = new Proxy(Proxy.Type.HTTP, addr);
url = new URL("https://translate.google.com/");
HttpURLConnection conn = (HttpURLConnection) url.openConnection(proxy);
conn.setRequestProperty("Accept-Encoding", "gzip, deflate, sdch");
conn.setRequestProperty("Connection", "keep-alive");
conn.setRequestProperty("User-Agent",
"Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.76 Mobile Safari/537.36");
conn.setRequestProperty("Accept", "*/*");
Map<String, List<String>> reqHeaders = conn.getHeaderFields();
List<String> reqTypes = reqHeaders.get("Content-Type");
for (String ss : reqTypes) {
System.out.println(ss);
}
InputStream in = conn.getInputStream();
String s = IOUtils.toString(in, "UTF-8");
System.out.println(s.substring(0, 100));
Map<String, List<String>> resHeader = conn.getHeaderFields();
List<String> resTypes = resHeader.get("Content-Type");
for (String ss : resTypes) {
System.out.println(ss);
}
Console is
But When I change url to http://translate.google.com/. It works well.
I know actually HttpURLConnection is HttpsURLConnection when i crawler https://translate.google.com/. I try to use HttpsURLConnection and it still garbled.
Any suggestions?
The response is compressed, because the above line tells the server that the client is able to understand encodings specified in
Accept-Encoding
.Try to comment this line or handle this situation.
There's a more specific implementation for HTTPS i.e.
HttpsURLConnection
, in case you're interested in https-specific features, e.g.: