use openAI CLIP with torch tensor or numpy array as input

970 Views Asked by At

How to use a pytorch tensor or an openCV iamge correctly as input for for OpenAi CLIP?

I tried the following but this didn't work so far :

device = "cuda" if torch.cuda.is_available() else "cpu"
clip_model, clip_preprocess = clip.load("ViT-B/32", device=device)
clip_preprocess(torch.from_numpy(OpenCVImage)).unsqueeze(0).to(device)
  • the preprocess step fails with message Process finished with exit code 137 (interrupted by signal 9: SIGKILL)
  • openCVImage is a object that was already processed with cv2.COLOR_BGR2RGB
2

There are 2 best solutions below

1
On BEST ANSWER

If you read the transforms code for CLIP, it shows that you need a PIL Image Object not a Numpy Array or Torch Tensor. These lines

def _transform(n_px):
    return Compose([
        Resize(n_px, interpolation=BICUBIC),
        CenterCrop(n_px),
        _convert_image_to_rgb,
        ToTensor(),
        Normalize((0.48145466, 0.4578275, 0.40821073), (0.26862954, 0.26130258, 0.27577711)),
    ])

In PyTorch, mostly Transforms are written for PIL not for Numpy or Torch due to performance and good results. So you have to convert each Image to PIL first via

image = Image.fromarray(np_img)

or you can simply use the last transform manually like

my_transform = Compose([
        ToTensor(),
        Normalize((0.48145466, 0.4578275, 0.40821073), (0.26862954, 0.26130258, 0.27577711)),
])
# Use it on np image now

my_transform(torch.from_numpy(OpenCVImage)).unsqueeze(0).to(device)
3
On

It seems like a system-related issue (the SIGKILL interrupt).
What system are you using? What OS? What type of device? Is it possible you exceeded CPU/GPU memory limit?