Temperature scaling a bayesian neural network?

33 Views Asked by At

I am trying to calibrate a Bayesian neural network. I have already approximated the posterior density for its weights. In order to make predictions the Bayesian way, I am taking samples from the approximated density, getting the corresponding probability for each sample (with softmax), and then averaging the probabilities to get a final output (since in Bayesian learning we want to get the expectation of the predicted probability with regard to the posterior weights distribution). This works well in terms of classification error, but my network is overconfident (as expected and observed with complex networks). I want to calibrate it. I read from multiple sources that temperature scaling is the go-to calibration method for neural networks, in which the softmax inputs are divided by a temperature parameter T, which is optimized to minimize the cross-entropy loss. In other words - instead of sending logits x_1...x_n into the softmax, send x_1/T...x_n/T. However, my model does not output raw logits which go into a softmax once, but it rather averages the softmax of multiple logits. So I am not sure how to implement temperature scaling in this scenario. Do I still fix a single temperature parameter and use it for all logits before averaging them? If so, how do I optimize it? If not - how else would one go about this? Thanks!

0

There are 0 best solutions below