How to get output_attentions of a pretrained Distilbert Model?

753 Views Asked by At

I am using a pretrained DistilBert model:

from transformers import TFDistilBertModel,DistilBertConfig

dbert = 'distilbert-base-uncased'

config = DistilBertConfig(max_position_embeddings=256 , dropout=0.2, 
                          attention_dropout=0.2, 
                          output_hidden_states=True,
                          output_attentions=True) #or true

dbert_model = TFDistilBertModel.from_pretrained(dbert, config)

input_ids_in = tf.keras.layers.Input(shape=(256,), name='input_id', dtype='int32')
input_masks_in = tf.keras.layers.Input(shape=(256,), name='attn_mask', dtype='int32') 

outputs = dbert_model([input_ids_in, input_masks_in], output_attentions = 1)

I am trying to get the output_attentions. But the output is of length 1 and is given as:

TFBaseModelOutput([('last_hidden_state', <KerasTensor: shape=(None, 256, 768) dtype=float32 (created by layer 'tf_distil_bert_model_6')>)])

i have given "output_attentions = True" in config and in forward pass "output_attentions = 1" is specified. Can anyone let me know what i am doing wrong? EDIT: I have changed the default configuration value of max_positional_embeddings of 512 to 256. if i change my model instantiation to

dbert_model = TFDistilBertModel.from_pretrained('distilbert-base-uncased',config=config)

it gives me the following error.

ValueError: cannot reshape array of size 393216 into shape (256,768)

768*512 being 393216. So it might be related to config code.

Any ideas?

1

There are 1 best solutions below

0
On BEST ANSWER

I am posting the answer as @cronoik suggested: I modified the code as dbert_model = TFDistilBertModel.from_pretrained('distilbert-base-uncased',config, output_attentions=True) This gave both hidden states and attention in output.