I am doing mulit-label classification for tabular data with fastai
(implemented using the multi-label helper in the fastai
documentation) and the model is performing great! But, when I inspected the model further, I noticed that the output layer is nothing but a linear transformation rather than using a Sigmoid activation function. More specifically, the architecture is
TabularModel(
(embeds): ModuleList(
(0): Embedding(25, 10)
(1): Embedding(9, 5)
(2): Embedding(33, 11)
(3): Embedding(32, 11)
(4): Embedding(8, 5)
(5): Embedding(207, 32)
(6): Embedding(3, 3)
(7): Embedding(3, 3)
(8): Embedding(3, 3)
(9): Embedding(3, 3)
(10): Embedding(3, 3)
(11): Embedding(3, 3)
(12): Embedding(5, 4)
)
(emb_drop): Dropout(p=0.0, inplace=False)
(bn_cont): BatchNorm1d(314, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(layers): Sequential(
(0): LinBnDrop(
(0): BatchNorm1d(410, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(1): Linear(in_features=410, out_features=200, bias=False)
(2): ReLU(inplace=True)
)
(1): LinBnDrop(
(0): BatchNorm1d(200, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(1): Linear(in_features=200, out_features=100, bias=False)
(2): ReLU(inplace=True)
)
(2): LinBnDrop(
(0): Linear(in_features=100, out_features=4, bias=True)
)
)
)v
I noticed that Jason Brownlee wrote here that for multi-label classification the activation function for the output layer should be a nn.Sigmoid()
. I tried replacing the last layer with a LinBnDrop(act=nn.Sigmoid())
layer, but then I get no improvement in validation loss and worse performance overall.
Why is it that that tabular_model implements a linear output layer automatically for multi-label classification?