Feeding an image to stacked resnet blocks to create an embedding

278 Views Asked by Mona Jalal At 27 July 2025 at 22:56

Do you have any code example or paper that refers to something like the following diagram?

I want to know why we want to stack multiple resnet blocks as opposed to multiple convolutional block as in more traditional architectures? Any code sample or referring to one will be really helpful.

Also, how can I transfer that to something like the following that can contain self-attention module for each resnet block?

Original Q&A

There are 1 best solutions below

Shai On 24 August 2021 at 05:10

Applying self-attention to the outputs of Resnet blocks at the very high resolution of the input image may lead to memory issues: The memory requirements of self-attention blocks grow quadratically with the input size (=resolution). This is why in, e.g., Xiaolong Wang, Ross Girshick, Abhinav Gupta, Kaiming He Non-Local Neural Networks (CVPR 2018) they introduced self-attention only at a very deep layer of the architecture, once the feature map was substantially sub-sampled.

Feeding an image to stacked resnet blocks to create an embedding

There are 1 best solutions below

Related Questions in PYTORCH

Related Questions in COMPUTER-VISION

Related Questions in RESNET

Related Questions in ATTENTION-MODEL

Related Questions in SELF-ATTENTION

Trending Questions

Popular # Hahtags

Popular Questions