I'm working on implementing an LSTM model for an Arabic dataset using BERT feature representations. I've utilized the 'asafaya/bert-base-arabic' model for this purpose:
bert_model = AutoModelForMaskedLM.from_pretrained('asafaya/bert-base-arabic')
Now, I'm facing the challenge of creating an embedding_matrix to be used in the subsequent statement:
`model_LSTM.add(Embedding(vocab_length, embedding_vector_features, weights=[embedding_matrix], input_length=length_long_sentence))
Given that BERT provides contextual embeddings, the feature representation for the same word varies based on context.
I would appreciate any guidance or suggestions on how to effectively create the embedding matrix for this scenario. Thank you!
I tried the following
`def bert_embedding_matrix():
bert = AutoModelForMaskedLM.from_pretrained("asafaya/bert-base-arabic",
output_hidden_states = True,)
bert_embeddings = list(bert.children())[0]
bert_word_embeddings = list(bert_embeddings.children())[0]
mat = bert_word_embeddings.word_embeddings.weight
return mat
embedding_matrix = bert_embedding_matrix()
`
but I have the following error ValueError: Layer embedding_1 weight shape (8155, 300) is not compatible with provided weight shape torch.Size([32000, 768]).