Incompatible shape issue with a mixture density network in tensorflow/keras

295 Views Asked by At

I'm stuck on a problem while computing a mixture density network through keras (using tensorflow backends). The goal of this MDN is to learn a latent representation of an image (in order to implement the predictions of the MDN in an autoencoder). Then, I would like to model my input image as a multivariate normal distribution and get as network output a mu and sigma vector (of dimension 64 each), and a set of N weights alpha (where N is the number of components in the mixture). If I consider an output having a shape of 64 for each parameter everything works well but it makes no sense to get more alpha factors than components (that is a 64 dimension alpha in my case). When I try to specify to alpha to get a different shape than mu and sigma, some issues appear.

In order to get a covariance matrix to implement in the MixtureSameFamily module of tensorflow, I consider a diagonal matrix of the sigma vector. Then, I've found this loss function (a negative log-likelihood) on many forums and try to adapt it to my problem:

def slice_parameter_vectors(parameter_vector):
    return tf.split(parameter_vector,[1*components,64*components,64*components],axis=1)

def gnll_loss(y, parameter_vector):
    alpha, mu, sigma = slice_parameter_vectors(parameter_vector)  # Unpack parameter vectors

gm = tfd.MixtureSameFamily(
    mixture_distribution=tfd.Categorical(probs=alpha),
    components_distribution=tfd.MultivariateNormalDiag(
        loc=mu,      
        scale_diag=  sigma))

    log_likelihood = gm.log_prob(tf.transpose(y))                 # Evaluate log-probability of y

    return -tf.reduce_mean(log_likelihood, axis=-1)

If I try to feed my network with some data and compile it, I alway have this error:

InvalidArgumentError: Incompatible shapes: [64,1,2] vs. [2,64]
     [[{{node loss_20/concatenate_6_loss/MultivariateNormalDiag/log_prob/affine_linear_operator/inverse/sub}} = Sub[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](loss_20/concatenate_6_loss/MixtureSameFamily/log_prob/pad_sample_dims/Reshape, loss_20/concatenate_6_loss/split:1)]]
     [[{{node loss_20/mul/_4221}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_1846_loss_20/mul", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

64 is the dimension of the vector, and 2 is the batch-size. Using a batch-size of 1 works but giving me some NaN as a loss.

Here is how alpha, mu and sigma layers are built:

    fc    = Dense((no_parameters-1) * components*64 + components, activation="tanh", name="fc")(layer)
    alphas = Dense(1*components, activation="softmax", name="alphas")(fc)
    mus    = Dense(64*components, name="mus")(fc)
    sigmas = Dense(64*components, activation=nnelu, name="sigmas")(fc)
    pvec   = Concatenate(axis=1)([alphas,mus,sigmas])
    mdn    = Model(inputs=inputs,outputs=pvec)

Thus, the question is: is it possible to do that through tensorflow? Is there anybody else using this kind of networks here and could explain to me how to deal with the loss function?

Kind regards,

Adrien

0

There are 0 best solutions below