Converting from PyTorch to Tensorflow for Self-Attention Pooling Layer

1k Views Asked by corcodile At 30 October 2021 at 09:49

I have found an implementation of the said layer from this paper, "Self-Attention Encoding and Pooling for Speaker Recognition", available at here via Pytorch. However, due to CUDA compatibility issues, I can't want to use the said code. Also, thus far, all my codes have been implemented in Tensorflow. So, I want to do a one-to-one translation/conversion or whatever, from PyTorch to Tensorflow.

First of all, this is the code in PyTorch:

class SelfAttentionPooling(nn.Module):
    def __init__(self, input_dim):
        super(SelfAttentionPooling, self).__init__()
        self.W = nn.Linear(input_dim, 1)
    
    def forward(self, batch_rep):
        """
        input:
            batch_rep : size (N, T, H), N: batch size, T: sequence length, H: Hidden dimension
      
        attention_weight:
            att_w : size (N, T, 1)
    
        return:
            utter_rep: size (N, H)
        """
        softmax = nn.functional.softmax
        att_w = softmax(self.W(batch_rep).squeeze(-1)).unsqueeze(-1)
        utter_rep = torch.sum(batch_rep * att_w, dim=1)

        return utter_rep

And this is my translation of the snippet code to Tensorflow:

class Self_Attention_Pooling(keras.layers.Layer): ?
    def __init__(self, input_dim):
        super(Self_Attention_Pooling, self).__init__()

        self.W = Dense(input_dim)

    def forward(self, batch_rep):
        softmax = Softmax()
        att_w = self.W(batch_rep)
        att_w = softmax(att_w)
        
        # Not so sure about these two lines though.
        #x = np.expand(batch_rep)
        #att_w = softmax(self.W(x))

        utter_rep = np.sum(batch_rep * att_w, axis=1)

        return utter_rep

Is my implementation/translation/conversion from PyTorch to Tensorflow correct? If not, please edit and help me.

Thank you very much.

Original Q&A

There are 1 best solutions below

M. Perier--Dulhoste On 14 January 2023 at 14:27

2 remarks regarding your implementation:

For custom layers in TF, you should implement the call method instead of the forward method cf Implementing custom layers.
For the operations you should replace the numpy functions by tensorflow functions to enable GPU support.

Here is the code I am using in TF for the SelfAttentionPooling:

import tensorflow as tf

class SelfAttentionPooling(tf.keras.layers.Layer):
    
    def __init__(self, **kwargs) -> None:
        super().__init__(**kwargs)
        self.dense = tf.keras.layers.Dense(units=1, use_bias=False)
    
    def call(self, x: tf.Tensor) -> tf.Tensor:
        """Apply the self attention pooling on input tensor.
        
        Args:
            x: input tensor (?, seq_len, emb_dim)
        
        Returns:
            (?, emb_dim)
        """
        # (?, seq_len)
        attention_weights = tf.nn.softmax(tf.squeeze(self.dense(x)))
        
        # (?, emb_dim)
        pooled = tf.reduce_sum(tf.expand_dims(attention_weights, axis=-1) * x, axis=1)

        return pooled

You can quickly check it gives the expected output:

self_attn_pooling = SelfAttentionPooling()
# (?, seq_len, emb_dim)
input_shape = 4, 9, 128
x = tf.random.normal(input_shape)

pooled = self_attn_pooling(x)

# (?, emb_dim)
assert pooled.shape == (4, 128)

Converting from PyTorch to Tensorflow for Self-Attention Pooling Layer

There are 1 best solutions below

Related Questions in TENSORFLOW

Related Questions in DEEP-LEARNING

Related Questions in PYTORCH

Related Questions in TRANSFORMER-MODEL

Related Questions in CODE-TRANSLATION

Trending Questions

Popular # Hahtags

Popular Questions