Challenges in Fine-Tuning a Causal Language Model using Triplet Dataset and Cosine Similarity

100 Views Asked by Aggyustic At 27 July 2025 at 19:19

I'm trying to fine-tune causal language model. I made triplet dataset and calculated cosine similarity(normalized and dot product) score for query/pos and query/neg. Then I got loss with cross entropy loss. But training loss and validation loss are not decreasing quite well.. I think cosine similarity is the problem. Because when I changed language model into roberta-base, cosine similarity got bad result, but only dot product was good(didn't normalize). The dot product score of causal language model is too big because its embedding dimension is really high(4096), so I had to normalize last hidden states. If I calculate loss with only using dot product, sometimes I get zero loss.. Is there any way to calculate similarity between query and pos/neg not using normalization?

Original Q&A

Challenges in Fine-Tuning a Causal Language Model using Triplet Dataset and Cosine Similarity

There are 0 best solutions below

Related Questions in NORMALIZATION

Related Questions in FINE-TUNING

Related Questions in TRIPLET

Trending Questions

Popular # Hahtags

Popular Questions