How can i get fully loaded html through python-mechanize?

2.1k Views Asked by At

Hi I'm using python mechanize to get datas from webpages. I'm trying to get imgurl from google image search webpage to download search result images.

Here's my code I fill search form as 'dog' and submit. (search 'dog')

import mechanize
import cookielib
import urllib2
import urllib

br = mechanize.Browser()
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)
br.set_handle_equiv(True)
br.set_handle_redirect(True)
br.set_handle_robots(False)
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time = 1)
br.addheaders = [('User-agent', 'Mozilla/5.0 (x11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1'), ('Accept', '*/*') ,('Accept-Language', 'ko-KR')]

br.open('http://www.google.com/imghp?hl=en')
br.select_form(nr=0)
br.form['q'] = 'dog'
a = br.submit()
searched_url = br.geturl()

file0 = open("1.html", "wb")
file0.write(a.read())
file0.close()

when i see page-source from chrome browser, there are 'imgurl's in pagesource. But when i read data from python mechanize, there's no such things. also, the size of 1.html(which i write by python) is much smaller than html file downloaded from chrome. How can i get exactly same html data as web-browsers by using python?

Do i have to set request headers same as web-browsers? thanks

0

There are 0 best solutions below