I'm searching with Google. Later, I want to get photos of the products I come across.
import requests, json, re
from parsel import Selector
params = {
"q": "tutku migros",
"hl": "tr", # language
"gl": "tr", # country of the search, US -> USA
#"tbm": "shop" # google search shopping tab
}
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36"
}
html = requests.get("https://www.google.com/search", params=params, headers=headers, timeout=30)
selector = Selector(text=html.text)
results = selector.css(".LicuJb")
a = results.css("img::attr(src)").extract()
This is the return I got.
['data:image/gif;base64,R0lGODlhAQABAIAAAP///////yH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==', 'data:image/gif;base64,R0lGODlhAQABAIAAAP///////yH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==', 'data:image/gif;base64,R0lGODlhAQABAIAAAP///////yH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==', 'data:image/gif;base64,R0lGODlhAQABAIAAAP///////yH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==', 'data:image/gif;base64,R0lGODlhAQABAIAAAP///////yH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==']
What I got is not same. Also they are all the same.
We can save the raw HTML and take a look at it to get a better idea of what's going on here. I added this to the end of your script:
If we take a look at
out.htmland search for ourLicuJbtag, we see that parsel is actually getting the correct values. Then why do we see different images when we go to that page in our web browser? This is because the webpage is running some javascript, which eventually replaces the image source placeholders with real image data. However, because we're using python's requests library, which simply fetches the static webpage, the javascript never runs and the placeholders never get replaced. This article explains the issue a little more.The solution is to use a python library that allows the javascript to run, such as Selenium. Rather than just fetching the static HTML, Selenium simulates a complete web browser, meaning it's able to run the javascript of dynamic web pages. (This also means it takes much longer.) Here's how you might get the images you're looking for using Selenium: