I am using a pretrained DistilBert model:
from transformers import TFDistilBertModel,DistilBertConfig
dbert = 'distilbert-base-uncased'
config = DistilBertConfig(max_position_embeddings=256 , dropout=0.2,
attention_dropout=0.2,
output_hidden_states=True,
output_attentions=True) #or true
dbert_model = TFDistilBertModel.from_pretrained(dbert, config)
input_ids_in = tf.keras.layers.Input(shape=(256,), name='input_id', dtype='int32')
input_masks_in = tf.keras.layers.Input(shape=(256,), name='attn_mask', dtype='int32')
outputs = dbert_model([input_ids_in, input_masks_in], output_attentions = 1)
I am trying to get the output_attentions. But the output is of length 1 and is given as:
TFBaseModelOutput([('last_hidden_state', <KerasTensor: shape=(None, 256, 768) dtype=float32 (created by layer 'tf_distil_bert_model_6')>)])
i have given "output_attentions = True" in config and in forward pass "output_attentions = 1" is specified. Can anyone let me know what i am doing wrong?
EDIT:
I have changed the default configuration value of max_positional_embeddings
of 512 to 256. if i change my model instantiation to
dbert_model = TFDistilBertModel.from_pretrained('distilbert-base-uncased',config=config)
it gives me the following error.
ValueError: cannot reshape array of size 393216 into shape (256,768)
768*512 being 393216. So it might be related to config code.
Any ideas?
I am posting the answer as @cronoik suggested: I modified the code as
dbert_model = TFDistilBertModel.from_pretrained('distilbert-base-uncased',config, output_attentions=True)
This gave both hidden states and attention in output.