What is the difference between the last transformer layer output and the middle ones?

17 Views Asked by Yu Yan At 27 July 2025 at 13:27

I'm working on a binary segmentation tasks using a Vision Transformer like Network. However, the ViT uses the last layer output to get the final mask while ignoring the middle layers' features. I wonder what is the function of each vision transformer layer, like, extracting the global features? If so, as the number of layers increase, the features will get better or what?

I try to visualize some of the features vector, but it is hard to tell soemthing intuitively. I want to know whether there are some papers which discuss the same problems and what are the results.

Original Q&A

What is the difference between the last transformer layer output and the middle ones?

There are 0 best solutions below

Related Questions in TRANSFORMER-MODEL

Related Questions in SEMANTIC-SEGMENTATION

Trending Questions

Popular # Hahtags

Popular Questions