According to Pytorch's documentation on Dropout1d
Randomly zero out entire channels (a channel is a 1D feature map, e.g., the j-th channel of the i-th sample in the batched input is a 1D tensor input
[i, j]). Each channel will be zeroed out independently on every forward call with probability p using samples from a Bernoulli distribution.
Then does that mean, with a tensor of shape (batch, channel, time), permute(0, 2, 1) should be used along with F.dropout1d(), so that the dropout will affect the channel dimension?
x = x.permute(0, 2, 1) # convert to [batch, channels, time]
x = F.dropout1d(x, p)
x = x.permute(0, 2, 1) # back to [batch, time, channels]
And will this piece of code be equivalent to Tensorflow's SpatialDropout1D?
That's correct, this piece of code will zero out values along the
channeldimension, and scale the remaining outputs by inputs by1/(1-p)so the sum over all inputs remains unchanged on average. That corresponds to the exact same behavior as tensorflow'sSpatialDropout1D.A code snippet to compare what the outputs look like: