About BertForMaskedLM

5.2k Views Asked by Đặng Huy At 14 April 2021 at 18:48

I have recently read about Bert and want to use BertForMaskedLM for fill_mask task. I know about Bert architecture. Also, as far as I know, BertForMaskedLM is built from Bert with a language modeling head on top, but I have no idea about what language modeling head means here. Can anyone give me a brief explanation.

Original Q&A

There are 2 best solutions below

Ashwin Geet D'Sa On 14 April 2021 at 19:18 BEST ANSWER

The BertForMaskedLM, as you have understood correctly uses a Language Modeling(LM) head .

Generally, as well as in this case, LM head is a linear layer having input dimension of hidden state (for BERT-base it will be 768) and output dimension of vocabulary size. Thus, it maps to hidden state output of BERT model to a specific token in the vocabulary. The loss is calculated based on the scores obtained of a given token with respect to the target token.

Minh On 12 February 2023 at 06:58

Additionally to @Ashwin Geet D'Sa's answer. Here is the Huggingface's LM head definition:

The model head refers to the last layer of a neural network that accepts the raw hidden states and projects them onto a different dimension.

You can find the Huggingface's definition for other terms at this page https://huggingface.co/docs/transformers/glossary

About BertForMaskedLM

There are 2 best solutions below

Related Questions in NLP

Related Questions in BERT-LANGUAGE-MODEL

Related Questions in HUGGINGFACE-TRANSFORMERS

Related Questions in LANGUAGE-MODEL

Trending Questions

Popular # Hahtags

Popular Questions