how to retrieve best guess for image from html google jsoup

489 Views Asked by At

We are trying to retrieve the best guess for an image given the html of the search results page returned by Google. We know that the best guess for the image has the class qb-b so we tried selecting elements with 'a' tag using the .select method. Yet when we printed the document retrieved using the get method of jsoup, the document did not contain any "best guess" string.

The code we wrote is below. How can we fix it?

String newUrl = connect1.getHeaderField("Location");

Document doc = Jsoup.connect(newUrl.toString()).get();            
Elements bestguess = doc.select("a.qb-b");

System.out.println(bestguess.toString());
1

There are 1 best solutions below

0
On

You have to set User-Agent header. Google will redirect you to main page instead. Try:

String newUrl = connect1.getHeaderField("Location");

Document doc = Jsoup.connect(newUrl.toString()).
                             userAgent("Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.76 Safari/537.36").
                             get();            
Elements bestguess = doc.select("a.qb-b");

System.out.println(bestguess.toString());