The SELU activation function (https://github.com/bioinf-jku/SNNs/blob/master/selu.py) requires the input to be normalized to have the mean value of 0.0 and the variance of 1.0. Therefore, I tried to apply tf.layers.batch_normalization (axis=-1) on the raw data to meet that requirement. The raw data in each batch have the shape of [batch_size, 15], where 15 refers to the number of features. The graph below shows the variances of 5 of these features returned from tf.layers.batch_normalization (~20 epochs). They are not all close to 1.0 as expected. The mean values are not all close to 0.0 as well (graphs not shown).
How should I get the 15 features all normalized independently (I expect every feature after normalization will have mean = 0 and var = 1.0)?

After reading the original papers of batch normalization (https://arxiv.org/abs/1502.03167) and SELU (https://arxiv.org/abs/1706.02515), I have a better understanding of them:
batch normalization is an "isolation" procedure to ensure the input (in any mini-batch) to the next layer has a fixed distribution, therefore the so called "shifting variance" problem is fixed. The affine transform ( γ*x^ + β ) just tunes the standardized x^ to another fixed distribution for better expressiveness. For the simple normalization, we need to turn the
centerandscaleparameters toFalsewhen callingtf.layers.batch_normalization.Make sure the
epsilon(still intf.layers.batch_normalization) is set to at least 2 magnitudes less than the lowest magnitude of the all input data. The default value ofepsilonis set to 0.001. For my case, some features have values as low as 1e-6. Therefore, I had to changeepsilonto 1e-8.The inputs to SELU have to be normalized before feeding them into the model.
tf.layers.batch_normalizationis not designed for that purpose.