How can I train a bert model for representational learning task that is domain specific?

345 Views Asked by At

I am trying to generate good sentence embeddings for some specific type od texts, using sentence transformer models while testing the the similarity and clustering using kmeans doesnt give good results. Any ideas to improve? I was thinking of training any of the sentence transformer model on my dataset(which are just sentences but do not have any labels). How can i retrain the existing models specifically on ny data to generate better embeddings. Thanks.

1

There are 1 best solutions below

0
On

The sentence embeddings produced by pre-trained BERT model are generic and need not be appropriate for all the tasks.

To solve this problem:

  1. Fine-tune the model with the task specific corpus on the given task (If the end goal is classification, fine-tune the model for classification task, later you can use the embeddings from the BERT model) (This is the method suggested for the USE embeddings, especially when the model remains a black-box)

  2. Fine-tune the model in unsupervised manner using masked language model. This doesn't require you to know the task before hand, but you can just use the actual BERT training strategy to adapt to your corpus.