I am trying to replace masking layer (mask for time steps) by simply adding the sample weights as 0 or 1. Tensor flow doc for losses mentions that it will simply scale the losses if an integer - https://www.tensorflow.org/api_docs/python/tf/keras/losses/CategoricalCrossentropy
I need to know if the gradients are non-zero for weight = 0, where sample_weights = [0,0,0,1,1,1,1] for 7 time steps, I could have tried it myself but I am really confused by the results, I dont know if I am doing something wrong.
Thanks in advance