Google redirects query request 503 error

349 Views Asked by At

I'm trying to make a basic program for google search. The first step is accessing the google web page for the results, for which I use:

http://google.com/search?q=something+somethang

With 'something something' being the query. What I get from the logging info is that I'm being redirected to:

2015-06-10 13:08:36,815 - INFO - Starting new HTTP connection (1): google.com
2015-06-10 13:08:37,487 - DEBUG - "GET /search?q=something+somethang HTTP/1.1" 302 359
2015-06-10 13:08:37,601 - INFO - Starting new HTTP connection (1): ipv4.google.com
2015-06-10 13:08:37,750 - DEBUG - "GET /sorry/IndexRedirect?continue=http://google.com/search%3Fq%3Dsomething%2Bsomethang&q=CGMSBJgH4AYYzN_hqwUiGQDxp4NLfKUWBsQJL2TkqfCe8pFtltJvTB0 HTTP/1.1" 503 2659
2015-06-10 13:08:37,831 - DEBUG - 503

The last line I printed, it's the status_code for the request.

I checked the link:

google.com/sorry/IndexRedirect?continue=http://google.com/search%3Fq%3Dsomething%2Bsomethang&q=CGMSBJgH4AYYzN_hqwUiGQDxp4NLfKUWBsQJL2TkqfCe8pFtltJvTB0

and it's to check for bots and stuff. Is there no way to make the program work?

Best,

2

There are 2 best solutions below

0
On

It seems whatever method you are using to fetch the webpage doesnt have follow redirects set to True.

Following should work -

import requests
r = requests.get('http://google.com/search?q=something+somethang')
print r.status_code
print r.content

You may also need to send the User-Agent string header.

0
On

You need to send the user-agent which will act as a "real" user visit.

A bot or browser sends a fake user-agent string to announce itself as a different client. In the case of requsts library, default requests user-agent is python-requests thus Google understands it, blocks a request.

You can read more about it in the blog post I wrote about how to reduce the chance of being blocked while web scraping.

Pass user-agent:

headers = {
    'User-agent':
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582'
}

requests.get('https://google.com/search?q=something+somethang', headers=headers)

Alternatively, you can do the same thing by using Google Organic Result API from SerpApi. It's a paid API with a free plan.

The difference is that you only need to grab the data you want from the structured JSON, rather than thinking about such things and figuring out why something doesn't work as it should.

Example code to integrate:

import os
from serpapi import GoogleSearch

params = {
  "engine": "google",
  "q": "tesla",
  "hl": "en",
  "gl": "us",
  "api_key": os.getenv("API_KEY"),
}

search = GoogleSearch(params)
results = search.get_dict()

for result in results["organic_results"]:
  print(result['title'])
  print(result['link'])

Disclaimer, I work for SerpApi.