PyTorch MultiHeadAttention implementation

58 Views Asked by carpet119 At 29 July 2025 at 06:42

In Pytorch's MultiHeadAttention implementation, regarding in_proj_weight, is it true that the first embed_dim elements correspond to the query, the next embed_dim elements correspond to the key, and the final embed_dim elements correspond to the value? Just confirming.

This is a question asked in the same context, but doesn't answer my specific question

Original Q&A

There are 1 best solutions below

Karl On 16 February 2024 at 03:25

Yes, that is the case.

You can see how in_proj_weight is used in the _in_projection_packed function

projection weights for q, k and v, packed into a single tensor. Weights
are packed along dimension 0, in q, k, v order.

PyTorch MultiHeadAttention implementation

There are 1 best solutions below

Related Questions in PYTORCH

Related Questions in MULTIHEAD-ATTENTION

Trending Questions

Popular # Hahtags

Popular Questions