How to generate multiple patches of a single string in BERT model

161 Views Asked by At

I am using a BERT model for generating text embeddings. My strings are like There is pneumonia detected in the left corner. When I encode() and pass a batch of 20 strings and I print the model output, it returns [20 256], where 20 is the batch size and 256 is the size of each output vector. It means that it generates texts in the form of vectors/tensors each with a size of 256 [1 256].

def create_text_encoder(
   num_projection_layers, projection_dims, dropout_rate, trainable=False):
   
   # Load the BERT preprocessing module.
   preprocess = hub.KerasLayer(
   "https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/2", name="text_preprocessing",)
   
   # Load the pre-trained BERT model to be used as the base encoder.
   bert = hub.KerasLayer(
    "https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-4_H-512_A-8/1", name="bert",)
  
   # Set the trainability of the base encoder.
   bert.trainable = trainable
  
   # Receive the text as inputs.
   inputs = layers.Input(shape=(), dtype=tf.string, name="text_input")
  
   # Preprocess the text.
   bert_inputs = preprocess(inputs)
  
   # Generate embeddings for the preprocessed text using the BERT model.
   embeddings = bert(bert_inputs)["pooled_output"]
   # Project the embeddings produced by the model.
   outputs = project_embeddings(
    embeddings, num_projection_layers, projection_dims, dropout_rate)
  
   # Create the text encoder model.
   return keras.Model(inputs, outputs, name="text_encoder")

Now I want to divide each string into 5 patches after feeding this There is pneumonia detected in the left corner single string to my above model. At first, the model was generating an embedding size of [1 256] for a single string, now it will generate [5 256] for a single text. Five vectors for single text each with a shape of 256.

Is it possible? Have someone done it before?

0

There are 0 best solutions below