Image Downloading Issue in Python: Code Not Progressing as Expected"

24 Views Asked by At

I want to download images from URLs listed in df and organize the images into directories based on the brand's ID. (df contain: product_id;brand_id ; image_url) My issue seems to be that the execution of the code stops, and it doesn't continue to the next image URL in case it encounters certain errors, like HTTP errors or when the image is not found.

import os
import requests
from urllib.parse import urlparse
import pandas as pd
import time

# Create a directory to store the images
os.makedirs("product_images", exist_ok=True)
brands_downloaded = {}

for index, row in df.iterrows():
    brand_id = row['brand_id']
    image_url = row['image_url']

    try:
        # Get the file name from the URL
        filename = os.path.basename(urlparse(image_url).path)

        # Define the full save path for the image
        save_path = os.path.join("product_images", f"brand_{brand_id}", filename)

        # Check if the brand has already been downloaded
        if brand_id in brands_downloaded:
            os.makedirs(os.path.dirname(save_path), exist_ok=True)
            with open(save_path, 'wb') as file:
                response = requests.get(image_url, stream=True)
                if response.status_code == 200:
                    for chunk in response.iter_content(1024):
                        file.write(chunk)
                    print(f"Image downloaded for brand {brand_id} and saved as {save_path}")
                else:
                    print(f"HTTP Error during download for brand {brand_id}: HTTP Error {response.status_code}")
        else:
            os.makedirs(os.path.dirname(save_path), exist_ok=True)
            with open(save_path, 'wb') as file:
                response = requests.get(image_url, stream=True)
                if response.status_code == 200:
                    for chunk in response.iter_content(1024):
                        file.write(chunk)
                    print(f"Image downloaded for brand {brand_id} and saved as {save_path}")
                    brands_downloaded[brand_id] = True
                else:
                    print(f"HTTP Error during download for brand {brand_id}: HTTP Error {response.status_code}")

        # Add a delay to avoid overloading the remote server
        time.sleep(1)

    except requests.exceptions.RequestException as e:
        # Handle request errors, such as connection errors, with requests.exceptions.RequestException
        print(f"Request error during download for brand {brand_id}: {str(e)}")
        continue

    except Exception as e:
        # Handle other errors with Exception
        print(f"Error during download for brand {brand_id}: {str(e)}")

print("Download completed.")

the result :

Error during download for brand 6582: Can't mix strings and bytes in path components
Image downloaded for brand 5947 and saved as product_images/brand_5947/41G8JXdqpaL._AA160_QL70_.jpg
Image downloaded for brand 6368 and saved as product_images/brand_6368/yo_ecos_es31_4.jpg
Error during download for brand 6197: [Errno 21] Is a directory: 'product_images/brand_6197/'
Image downloaded for brand 6368 and saved as product_images/brand_6368/YH_ADVAN_A035.jpg
HTTP Error during download for brand 7223: HTTP Error 403
Image downloaded for brand 4883 and saved as product_images/brand_4883/41yMo9r3QDL._AA160_QL70_.jpg
HTTP Error during download for brand 6197: HTTP Error 404
Error during download for brand 6850: [Errno 21] Is a directory: 'product_images/brand_6850/'
Image downloaded for brand 6653 and saved as product_images/brand_6653/D_NQ_NP_838620-MLM47393861689_092021-F.jpg
Error during download for brand 23416: [Errno 21] Is a directory: 'product_images/brand_23416/'
Error during download for brand 43700: [Errno 21] Is a directory: 'product_images/brand_43700/'
HTTP Error during download for brand 8699: HTTP Error 403
Image downloaded for brand 6255 and saved as product_images/brand_6255/br4_blizzak_vrx2.jpg
Image downloaded for brand 5947 and saved as product_images/brand_5947/5ef90680-f636-44ad-bbb9-464930a82e4b_1.493f806be8d47a94e169beb04e09133b.jpeg
Image downloaded for brand 47434 and saved as product_images/brand_47434/GUEST_16a593dd-e935-401d-9be8-01ac2a63752c
Image downloaded for brand 51782 and saved as product_images/brand_51782/GUEST_d352ab48-d9c2-4bb6-8d8a-b814a79500b1
Image downloaded for brand 6681 and saved as product_images/brand_6681/image.ashx
HTTP Error during download for brand 6982: HTTP Error 404

As you can see, the code is still running without any progress. I've also noticed that there are non-image files in the output directory. (https://i.stack.imgur.com/AnTHD.png) Mercii :))

0

There are 0 best solutions below