I am using a BERT model for generating text embeddings. My strings are like There is pneumonia detected in the left corner
. When I encode()
and pass a batch of 20 strings and I print the model output, it returns [20 256]
, where 20 is the batch size and 256 is the size of each output vector. It means that it generates texts in the form of vectors/tensors each with a size of 256 [1 256].
def create_text_encoder(
num_projection_layers, projection_dims, dropout_rate, trainable=False):
# Load the BERT preprocessing module.
preprocess = hub.KerasLayer(
"https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/2", name="text_preprocessing",)
# Load the pre-trained BERT model to be used as the base encoder.
bert = hub.KerasLayer(
"https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-4_H-512_A-8/1", name="bert",)
# Set the trainability of the base encoder.
bert.trainable = trainable
# Receive the text as inputs.
inputs = layers.Input(shape=(), dtype=tf.string, name="text_input")
# Preprocess the text.
bert_inputs = preprocess(inputs)
# Generate embeddings for the preprocessed text using the BERT model.
embeddings = bert(bert_inputs)["pooled_output"]
# Project the embeddings produced by the model.
outputs = project_embeddings(
embeddings, num_projection_layers, projection_dims, dropout_rate)
# Create the text encoder model.
return keras.Model(inputs, outputs, name="text_encoder")
Now I want to divide each string into 5 patches after feeding this There is pneumonia detected in the left corner
single string to my above model. At first, the model was generating an embedding size of [1 256] for a single string, now it will generate [5 256] for a single text. Five vectors for single text each with a shape of 256.
Is it possible? Have someone done it before?