How to access the value projection at MultiHeadAttention layer in Pytorch

143 Views Asked by At

I'm making an own implementation for the Graphormer architecture. Since this architecture needs to add an edge-based bias to the output for the key-query multiplication at the self-attention mechanism I am adding that bias by hand and doing the matrix multiplication for the data with the attention weights outside the attention mechanism:

import torch as th
from torch import nn

# Variable inicialization
B, T, C, H = 2, 3, 4, 2
self_attn = nn.MultiheadAttention(C, H, batch_first = True)

# Tensors
x = th.randn(B, T, C)
attn_bias = th.ones((B, T, T))

#  Self-attention mechanism
_, attn_wei = self_attn(query=x, key=x, value=x)

# Adding attention bias
if attn_bias is not None:
    attn_wei = attn_wei + attn_bias

x = attn_wei @ x # TODO use value(x) instead of x


This works, but for using the full potential of self-attention, the last matrix multiplication should be like x = attn_wei @ value(x) but I am not able to get the value projector from the selt_attn object as it should have something like that inside of it.

How could I do this?


There are 0 best solutions below