In Pytorch's MultiHeadAttention implementation, regarding in_proj_weight, is it true that the first embed_dim elements correspond to the query, the next embed_dim elements correspond to the key, and the final embed_dim elements correspond to the value? Just confirming.
This is a question asked in the same context, but doesn't answer my specific question
Yes, that is the case.
You can see how
in_proj_weight
is used in the _in_projection_packed function