I'm getting an error when using the Donut Model: Input Type and Bias type should be the same

531 Views Asked by At

I'm trying to extract text from image using the Donut Model which is an Image Parser. It seems that the input image is not in the proper format.

I'm getting an error that says: RuntimeError: Input type (float) and bias type (c10::BFloat16) should be the same on this line:

output = model.inference(image=image, prompt="<s_cord-v2>")

Here is my entire code:

    from donut import DonutModel 
    from PIL import Image 
    import torch 

    model = DonutModel.from_pretrained("naver-clova-ix/donut-base- 
    finetuned-cord-v2") 

    if torch.cuda.is_available():
        model.half()      
        device = torch.device("cuda")      
        model.to(device)  
    else:      
        model.encoder.to(torch.bfloat16) model.eval()  

    image = Image.open("testfolder/test1.jpg").convert("RGB") 
    output = model.inference(image=image, prompt="<s_cord-v2>") 
    output

I understand that image is not in the right format, but how would I go about fixing that?

1

There are 1 best solutions below

0
On

You can fix the error by removing the else-clause in your code. I suppose you have a CPU and bfloat16 is probably wrong for the model. This code works fine for me:

from donut import DonutModel 
from PIL import Image 
import torch 

model = DonutModel.from_pretrained("naver-clova-ix/donut-base-finetuned-cord-v2") 

if torch.cuda.is_available():
    model.half()      
    device = torch.device("cuda")      
    model.to(device)  

model.eval()  
image = Image.open("./donut/misc/sample_image_cord_test_receipt_00004.png").convert("RGB") 
output = model.inference(image=image, prompt="<s_cord-v2>") 
output