Encoder - Decoder neural network architecture with different input and output size

33 Views Asked by At

I am trying to figure out what would be a good architecture for neural network that takes projections (2D images) from different angles and creates volume consisting of 2D slices (CT-like).

So for example:

  • Input [180,100,100] -> 180 projections of image 100x100 pixels.
  • Output [100,100,100] -> Volume of size 100x100x100 (100 slices of 2D images)

I have ground truth volumes.

I came up with the idea of using ResNet as Encoder. But I'm not really sure how to implement Decoder and what model would be a good choice for this kind of problem. I did think of U-net architecture, but output dimension is different, so I've abandoned this idea.

I am using PyTorch.

1

There are 1 best solutions below

0
Karl On BEST ANSWER

Specifying the whole network is out of scope of a single answer, but generally you want something like this:

  1. Use a Resnet or vision transformer as the encoder
  2. Use the encoder to map the input down to a latent tensor
  3. Reshape latent tensor as needed
  4. Use ConvTranspose3d layers to upsample latent tensor to desired output size

You can do a UNet-like setup where you have skip connections between encoder layers and decoder layers, you would just need a projection layer to map the encoder activations into a shape compatible with the decoder activations.