PyTorch MultiHeadAttention implementation

65 Views Asked by At

In Pytorch's MultiHeadAttention implementation, regarding in_proj_weight, is it true that the first embed_dim elements correspond to the query, the next embed_dim elements correspond to the key, and the final embed_dim elements correspond to the value? Just confirming.

This is a question asked in the same context, but doesn't answer my specific question

1

There are 1 best solutions below

0
On

Yes, that is the case.

You can see how in_proj_weight is used in the _in_projection_packed function

projection weights for q, k and v, packed into a single tensor. Weights
are packed along dimension 0, in q, k, v order.