I was trying to build a cnn to with Pytorch, and had difficulty in maxpooling. I have taken the cs231n held by Stanford. As I recalled, maxpooling can be used as a dimensional deduction step, for example, I have this (1, 20, height, width) input ot max_pool2d (assuming my batch_size is 1). And if I use (1, 1) kernel, I want to get output like this: (1, 1, height, width), which means the kernel should be slide over the channel dimension. However, after checking the pytorch docs, it says the kernel slides over height and width. And thanks to @ImgPrcSng on Pytorch forum who told me to use max_pool3d, and it turned out worked well. But there is still a reshape operation between the output of the conv2d layer and the input of the max_pool3d layer. So it is hard to be aggregated into a nn.Sequential, so I wonder is there another way to do this?
Pytorch maxpooling over channels dimension
10.9k Views Asked by Sun Chuanneng At
3
There are 3 best solutions below
0
On
To max-pool in each coordinate over all channels, simply use layer from einops
from einops.layers.torch import Reduce
max_pooling_layer = Reduce('b c h w -> b 1 h w', 'max')
Layer can be used in your model as any other torch module
0
On
I'm not sure why the other answers are so complicated. Max pooling over the whole channels dimension to get an output with only 1 channel sounds equivalent to just taking the maximum value over that dimension:
torch.amax(left_images, dim=1, keepdim=True)
Note: If your goal is just to perform dimensional reduction, generally people use 1x1 conv layers with large to small number of channels rather than maxpooling over the channels dimension.
Would something like this work?
Or, using einops