Image to Text - Hugging Face API Inference - Input Error

185 Views Asked by At

I would like to create a Python script in which I send a POST request via the Hugging Face API Inference for an Image to Text model. The model is: nlpconnect/vit-gpt2-image-captioning link. I’m having issues with sending the image, as the POST request is returning a 400 error. The Python script is as follows:

import base64
import requests
import os

def query(API_TOKEN):
    model = 'Salesforce/blip-image-captioning-large'
    headers = {"Authorization": f"Bearer {API_TOKEN}"}
    image_path = "./demo.jpg"

    # Check if the image file exists
    if not os.path.isfile(image_path):
        return {"error": "Image file does not exist"}

    with open(image_path, "rb") as image_file:
        try:
            # Try to encode the image file
            encoded_string = base64.b64encode(image_file.read()).decode()
        except Exception as e:
            return {"error": f"Error encoding image: {str(e)}"}

    data = {
        "inputs": {
            "images": [encoded_string],  # using the base64 encoded string
            "texts": ["a photography of"]  # Optional, based on your current class logic
        }
    }

    try:
        # Try to send a request to the API endpoint
        response = requests.post(
            f'https://api-inference.huggingface.co/models/{model}',
            headers=headers,
            json=data
        )
    except Exception as e:
        return {"error": f"Error sending request: {str(e)}"}

    return response.json()

The function returns the error: {'error': ["Error in inputs: Invalid image: {'images': ['/9j/4AAQSkZJRgABAQEA8ADwAA...zm2Z8+UaGwKf/Z'], 'texts': ['a photography of']}"]}.

I’m struggling to identify the source of my error. Could someone help me? Thank you!

I tried calling the function but it gives me the error: {'error': ["Error in inputs: Invalid image: {'images': ['/9j/4AAQSkZJRgABAQEA8ADwAA...zm2Z8+UaGwKf/Z'], 'texts': ['a photography of']}"]}.

The image that i provided is demo.jpg: !wget https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg

0

There are 0 best solutions below