I have a model, which is extracting features from two different networks (SwinTransformer3D()
and MyNetwork(...)
) in parallel and then concatenates two obtained features from two networks.
class MyModel(nn.Module):
def __init__(self):
super(MyModel, self).__init__()
self.features1 = SwinTransformer3D(pretrained=None,
pretrained2d=False,
patch_size=self.patch_size,
in_chans=1,
embed_dim=dim,
depths=[2, 2, 6, 2],
num_heads=[3, 6, 12, 24],
window_size=window_size, #(20,7,7),
mlp_ratio=4.,
qkv_bias=True,
qk_scale=None,
drop_rate=0.,
attn_drop_rate=0.,
drop_path_rate=0.2,
norm_layer=torch.nn.LayerNorm,
patch_norm=True,
frozen_stages=-1,
use_checkpoint=False)
self.features2 = MyNetwork(...)
self.dropout = nn.Dropout(emb_dropout)
self.fc1 = nn.Linear(self.num_features, self.num_features)
self.fc2 = nn.Linear(self.num_features, self.num_features)
self.fc_out = nn.Linear(2*self.num_features, self.num_features)
def forward(self, x):
x = self.to_patch_embedding(x) #ln 1
**x = x + pos_embed**
**x = self.dropout(x)**
x1 = self.features1(x) #ln2
x1 = x1.view(x1.size(0), -1)
x1 = F.relu(self.fc1(x1))
x2 = self.features2(x)
x2 = x2.view(x2.size(0), -1)
x2 = F.relu(self.fc2(x2))
# Concatenate in dim1 (feature dimension)
x = torch.cat((x1, x2), 1)
x = self.fc_out(x)
return x
I have a few questions:
Since SwinTransformer is computing parameters for the position bias, what is
pos_drop
is for in this line?and the second network
MyNetwork(...)
hasclass MyNetwork(nn.Module): def __init__(self): super(MyNetwork, self).__init__() self.pos_embed = nn.Parameter(torch.zeros(1, num_patches, embed_dim)) self.pos_drop = nn.Dropout(p=drop_rate) ... def forward_features(self, x): .... x = x + self.pos_embed x = self.pos_drop(x) .... return x
and it is added in its forward_features()
method, does not mismatch with the position of the token in SwinTransformer()
and should I remove it from this class of network?
Since I have the same input patches (x) for both networks, should I add the position embedding to
MyModel
,forward()
method (between ln1 and ln2)and remove theself.pos_embed
andself.pos_drop
from MyNetwork() and also remove theself.pos_drop = nn.Dropout(p=drop_rate)
from SwinTransformer()?
How this may affect training?
I would really appreciate if you give your expert opinion on this. where should I add pos_embed
and pos_drop
when we have a model that is combined from two different models (in parallel) and each extracting two different features?