In the traditional residual block, is the "addition" of layer N to the output of layer N+2 (prior to non-linearity) element-wise addition or concatenation?
The literature indicates something like this:
X1 = X
X2 = relu(conv(X1))
X3 = conv(X2)
X4 = relu(conv(X3 + X1))
It has to be element-wise, with concatenation you don't get a residual function. One has also to be aware about using the proper padding mode so convolutions produce outputs with the same spatial dimensions as the block input.