Scrape Product Image with BeautifulSoup (Error)

200 Views Asked by At

I need your help. I'm working on a telegram bot which sends me all the sales from amazon. It works well but this function doesn't work properly. I have always the same error that, however, blocks the script

imgs_str = img_div.img.get('data-a-dynamic-image') # a string in Json format AttributeError: 'NoneType' object has no attribute 'img'

 def take_image(soup):
    
    img_div = soup.find(id="imgTagWrapperId")

    imgs_str = img_div.img.get('data-a-dynamic-image')  # a string in Json format

    # convert to a dictionary
    imgs_dict = json.loads(imgs_str)
    #each key in the dictionary is a link of an image, and the value shows the size (print all the dictionay to inspect)
    num_element = 0 
    first_link = list(imgs_dict.keys())[num_element]
    return first_link 

I still don't understand how to solve this issue. Thanks for All!

2

There are 2 best solutions below

0
Fishball Nooodles On

From the looks of the error, soup.find didn't work. Have you tried using images = soup.findAll("img",{"id":"imgTagWrapperId"}) This will return a list

3
just a stranger On

Images are not inserted in HTML Page they are linked to it so you need wait until uploaded. Here i will give you two options;

1-) (not recommend cause there may be a margin of error) simply; you can wait until the image is loaded(for this you can use "time.sleep()"

2-)(recommend) I would rather use Selenium Web Driver. You also have to wait when you use selenium, but the good thing is that selenium has a unique function for this job. I will show how make it with selenium;

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException

browser = webdriver.Chrome()
browser.get("url")
delay = 3 # seconds
try:
    myElem = WebDriverWait(browser, delay).until(EC.presence_of_element_located((By.ID, 'imgTagWrapperId')))# I used what do you want find
    print ("Page is ready!")
except TimeoutException:
    print ("Loading took too much time!")

More Documention

Code example for way 1

Q/A for way 2