I have been reading about TabNet model and it's predictions "explanations" through the attentive transformers' masks values.
However, if the inputs values are not normalized, aren't this masks values simply a normalization scalar (and not a feature's importance value)?
E.g.: A feature Time is expressed in days and has a mask mean value of 1/365, it could mean that the mask is simply normalizing the feature?
Let me know if I wasn't clear in my question.