I'm trying to extract text from image using the Donut Model which is an Image Parser. It seems that the input image is not in the proper format.
I'm getting an error that says:
RuntimeError: Input type (float) and bias type (c10::BFloat16) should be the same
on this line:
output = model.inference(image=image, prompt="<s_cord-v2>")
Here is my entire code:
    from donut import DonutModel 
    from PIL import Image 
    import torch 
    model = DonutModel.from_pretrained("naver-clova-ix/donut-base- 
    finetuned-cord-v2") 
    if torch.cuda.is_available():
        model.half()      
        device = torch.device("cuda")      
        model.to(device)  
    else:      
        model.encoder.to(torch.bfloat16) model.eval()  
    image = Image.open("testfolder/test1.jpg").convert("RGB") 
    output = model.inference(image=image, prompt="<s_cord-v2>") 
    output
I understand that image is not in the right format, but how would I go about fixing that?
 
                        
You can fix the error by removing the else-clause in your code. I suppose you have a CPU and bfloat16 is probably wrong for the model. This code works fine for me: