Challenges in Fine-Tuning a Causal Language Model using Triplet Dataset and Cosine Similarity

113 Views Asked by At

I'm trying to fine-tune causal language model. I made triplet dataset and calculated cosine similarity(normalized and dot product) score for query/pos and query/neg. Then I got loss with cross entropy loss. But training loss and validation loss are not decreasing quite well.. I think cosine similarity is the problem. Because when I changed language model into roberta-base, cosine similarity got bad result, but only dot product was good(didn't normalize). The dot product score of causal language model is too big because its embedding dimension is really high(4096), so I had to normalize last hidden states. If I calculate loss with only using dot product, sometimes I get zero loss.. Is there any way to calculate similarity between query and pos/neg not using normalization?

0

There are 0 best solutions below