Smartest way to add KL Divergence into (Variational) Auto Encoder

1.7k Views Asked by At

I have an Auto Encoder model with multiple outputs and weightening which a want to enrich into a Variational Auto Encoder. I followed this: https://keras.io/examples/generative/vae/ official keras tutorial.

But if a manually adapt the train_step function I lose the majority of my original implementation details:

  1. I got two weighted optimization goals: re-construction (decoder) and classification (softmax)
  2. accuracy metrics for the classification
  3. the original fit method also takes care of the validation data and corresponding metrics

Adding the suggested sampling layer according to the keras link is no problem, but to correctly implement the Kullback-Leibler-Loss as it depends on the additional parameters z_mu and z_log_var which is not supported by standard Keras losses.

I search for some workarounds to solve this issue but none of them was succesfull:

  1. re-writing the train_step: its hard to fully re-implement all details ( weightening, multiple losses with different inputs -> decoder: data, classifier: labels etc)
  2. adding a psyeudo layer to the ecoder that calculates the loss: https://tiao.io/post/tutorial-on-variational-autoencoders-with-a-concise-keras-implementation/ like here. But here is the problem that the add loss function does not specify to which key and how KL-Loss is added to the model's total loss
  3. Adding everything as global/top-level element to make the z_mu, z_log_var accessible for the loss calculation like here: https://www.machinecurve.com/index.php/2019/12/30/how-to-create-a-variational-autoencoder-with-keras/. This is the approach I like the least as my current architecture is parametrized to be able to e.g. perform hyperopt tuning

I was not able to find a pleasing solution to this problem, as VAE's are more and more popular I am surprised by the phenomenon that there is no extended tutorial about this especially when dealing with multiple in- and outputs. Or I am just unable to find the right answers through my query.

Any opinions welcome!

1

There are 1 best solutions below

0
On

After a couple of re-designs I and bug-ticket tracing I found this recent example: here

The VAE examples can be found at the very bottom of the post.

  1. Solution: write your own train_step: cleanest but also hardest solution depending how complex your loss calculation is.
  2. Solution: use a functional approach the access the necessary variables and add the loss with .add_loss: not very clean but straight to implement (you will lose an additional loss tracker for the KL-loss)

To achieve my weighting I weighted the KL loss before I added it via .add_loss according to the weight of my decoder loss.

Note: The first solution I tested was to define a custom loss function for the mse+kl loss and added it into my functional designed model - this works if one turns of the tf eager eval off. But be careful this really slows down your network and you will lose the ability to monitor your training via tensorboard if you don't have admin rights for your nvidia gpu (profile_batch=0 does not turn off profiling if eager mode is switched off, therefore you ran into INSUFFICENT_PRIVILEDGES Errors with the CUPTI driver)