How can I train a bert model for representational learning task that is domain specific?

366 Views Asked by adit94 At 08 December 2020 at 14:09

I am trying to generate good sentence embeddings for some specific type od texts, using sentence transformer models while testing the the similarity and clustering using kmeans doesnt give good results. Any ideas to improve? I was thinking of training any of the sentence transformer model on my dataset(which are just sentences but do not have any labels). How can i retrain the existing models specifically on ny data to generate better embeddings. Thanks.

Original Q&A

There are 1 best solutions below

Ashwin Geet D'Sa On 08 December 2020 at 14:56

The sentence embeddings produced by pre-trained BERT model are generic and need not be appropriate for all the tasks.

To solve this problem:

Fine-tune the model with the task specific corpus on the given task (If the end goal is classification, fine-tune the model for classification task, later you can use the embeddings from the BERT model) (This is the method suggested for the USE embeddings, especially when the model remains a black-box)
Fine-tune the model in unsupervised manner using masked language model. This doesn't require you to know the task before hand, but you can just use the actual BERT training strategy to adapt to your corpus.

How can I train a bert model for representational learning task that is domain specific?

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in EMBEDDING

Related Questions in BERT-LANGUAGE-MODEL

Related Questions in SENTENCE-TRANSFORMERS

Trending Questions

Popular # Hahtags

Popular Questions