Is BertForSequenceClassification using the CLS vector?

13 Views Asked by At

In the hugging face source code, pooled_output = outputs[1] is used.

        outputs = self.bert(
            input_ids,
            attention_mask=attention_mask,
            token_type_ids=token_type_ids,
            position_ids=position_ids,
            head_mask=head_mask,
            inputs_embeds=inputs_embeds,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )

        pooled_output = outputs[1]

Shouldn't it be pooled_output = outputs[0]? (This answer mentioning BertPooler seems to be outdated)

Based on this answer, it seems that the CLS token learns a sentence level representation. I am confused as to why/how masked language modelling would lead to the start token learning a sentence level representation. (I am thinking that BertForSequenceClassification freezes the Bert model and only trains the classification head, but maybe that's not the case)

Would a sentence embedding be equivalent or even better than the [CLS] token embedding?

0

There are 0 best solutions below