I'm implementing a sort of segmentation algorithm to classify 1D data: my model output a prediction of the class for each point in the 1D data. For this I'm using Conv1d and Dense layers in the architecture. Since I would like to output also the uncertainty of the prediction, I'm trying to implement the SNGP approach (Liu et Al. 2022 - A Simple Approach to Improve Single-Model Deep Uncertainty via Distance-Awareness) where you need to add Spectral Normalization to the hidden layers of your model to prevent feature collapse and you need to replace Softmax output with a Gaussian Process. My question is: how do you choose spectral norm value for Dense and Conv1d layers? In the paper they used 6 for Conv2d layers and 0.95 for Dense layers. However if I use the same values I got unoptimal results and actually the learning is more unstable (which is the opposite effect of what spectral normalization was introduced for).
I printed out the spectral norm of layers (without spectral normalization constraint) during training and I found out that, while Conv1d have a spectral norm a bit lower than 6, dense layers reach around 20 (especially at the end of training). I'm thinking that my prediction were suboptimal with Spectral Normalization because I set a way too low constraint on dense layers (am I right?). However I'm worried that setting the value too high (say around 15) I will loose the beneficial effect for the feature collapse. Is there a rule of thumb, an upper bound, for the spectral normalization value?