how can we apply masked language modelling on the images using multimodal models? How can we implement such a thing and get MLM scores?

55 Views Asked by lazytux At 28 July 2025 at 00:39

It might not be clear from the question what I want to say, but how can we apply masked language modelling with the text and image given using multimodal models like lxmert. For example, if there is some text given (This is a MASK) and we mask some word in it, and there is an image given (maybe of a cat), how can we apply MML to predict the word as cat? How can we implement such a thing and get MLM scores out of it using huggingface library api? A snippet of code explaining such will be great. If anyone can help, it would help in better understanding.

Original Q&A

how can we apply masked language modelling on the images using multimodal models? How can we implement such a thing and get MLM scores?

There are 0 best solutions below

Related Questions in IMAGE-PROCESSING

Related Questions in HUGGINGFACE-TRANSFORMERS

Related Questions in BERT-LANGUAGE-MODEL

Related Questions in TRANSFORMER-MODEL

Related Questions in MULTIMODAL

Trending Questions

Popular # Hahtags

Popular Questions