Keras concatenated model: ValueError: Data cardinality is ambiguous

66 Views Asked by At

I am creating a concatenated model using keras. For now, I am keeping it simple, using only Dense layers and without any kind of hyperparameters optimization. My model should be able to get data from two different datasets, with a different number of samples in each dataset.

After creating and compiling the model, when I try to fit the model on the two datasets, I get this error:

ValueError: Data cardinality is ambiguous:
  x sizes: 2093, 807
  y sizes: 2093, 807
Make sure all arrays contain the same number of samples.

2093 and 807 are the number of lines in the two different datasets.

I would have expected each base model to learn independently from the other, using only the input data available to that model, and then the concatenated model to output a prediction based on the characteristics of each sample in the test set. I know I could pad the two datasets, adding rows full of 0 for each sample that does not have any measurements in that dataset, but I would prefer to avoid it, if possible. Does anyone know a workaround for this kind of problem?

I checked similar questions, and they are mainly getting the same error when they do not intend for the cardinality to be different, while it would be an intended feature in my model.

Thanks in advance

EDIT: here is my code, in case it helps.

print(x_train_diags.shape)
print(x_train_labs.shape)
print(y_train_diags.shape)
print(y_train_labs.shape)

input_diags = tf.keras.layers.Input(shape=(1032,))
dense_1_diags = tf.keras.layers.Dense(16, activation = tf.keras.activations.elu)(input_diags)
dense_2_diags = tf.keras.layers.Dense(4, activation = tf.keras.activations.elu)(dense_1_diags)

input_labs = tf.keras.layers.Input(shape=(230,))
dense_1_labs = tf.keras.layers.Dense(16, activation = tf.keras.activations.elu)(input_labs)
dense_2_labs = tf.keras.layers.Dense(4, activation = tf.keras.activations.elu)(dense_1_labs)

concatenation_layer = tf.keras.layers.Concatenate()([dense_2_diags, dense_2_labs])

output = tf.keras.layers.Dense(units = 1, activation=tf.keras.activations.elu)(concatenation_layer)

full_model = tf.keras.Model(inputs=[input_diags, input_labs], outputs=[output])

full_model.compile(loss=tf.keras.losses.BinaryCrossentropy(from_logits=False), 
                                    optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
                                    metrics=[tf.keras.metrics.AUC(), tf.keras.metrics.Precision(), tf.keras.metrics.Recall()]
                    )
full_model.fit([x_train_diags, x_train_labs], [y_train_diags, y_train_labs], batch_size=64, epochs=5)

The results of the print statements are:

(2093, 1032) (807, 230) (2093, 1) (807, 1)

so everything is as expected there.

0

There are 0 best solutions below