Context
The input to my model is a BatchDataset object called dataset_train, and it is batched to yield (training_data, label).

For some of the machinery in my model, I need to be able to split the Dataset tuple inside the model and independently access both the data and the label. This is a single input model with multiple outputs, so I am using Tensorflow's Functional API. For the sake of reproducibility, I am working with timeseries, so a toy dataset would look like this:

time = np.arange(1000)
data = np.random.randn(1000)
label = np.random.randn(1000)


training_data = np.zeros(shape=(time.size,2))
training_data[:,0] = time
training_data[:,1] = data


dataset_train = tf.keras.utils.timeseries_dataset_from_array(
      data = training_data,
      targets = label,
      batch_size = batch_size, 
      sequence_length = sequence_length,
      sequence_stride = 1,
  )

Note: Sequence Length and batch_size are additional semi-arbitrary hyperparameters that are not important for the purposes of this question.

Question
How do I split apart the Dataset in Tensorflow's Functional API into the training data element and the label element? Here is pseudocode of what I am looking for:

input = Single Input Layer that defines something capable of accepting dataset_train

training_data  = input.element_spec[0]
label = input.element_spec[1]

After that point, my model can perform it's actions on training_data and label independently.

First Solution I tried:
I first started by trying to define two input layers and pass each element of the dataset tuple to each input layer, and the act on each input layer independently.

training_data = tf.keras.Input(shape=(sequence_length,2))
label = tf.keras.Input(shape = sequence_length)

#model machinery

model = tf.keras.Model(
    inputs = [training_data, label],
    outputs = [output_1, output_2]
)

#model machinery

history = model.fit(dataset_train, epochs = 500)

The first problem I had with this is that I got the following error:

ValueError: Layer "model_5" expects 2 input(s), but it received 1 input tensors. Inputs received: [<tf.Tensor 'IteratorGetNext:0' shape=(None, None, 2) dtype=float64>]

This is a problem, because if I actually pass the model a dictionary of datasets (nevermind that this isn't supported) then I introduce a circular dependency where in order to use model.predict, it expects labels for the inputs to model.predict. In other words, I need the answers to get the answers. Because I need to pass it only a single Dataset to prevent introducing this circular dependency (tensorflow implicitly assumes that the second element in a Dataset is the label, and doesn't require Datasets with labels for model.predict), I decided to abandon this strategy for unpacking the Input layer directly within the functional API for the model.

Second Solution I tried:
I thought maybe I could unpack the Dataset using the .get_single_element() method in the following code excerpt

input = tf.keras.Input(shape = (sequence_length, 2))
training_dataset, label = input.get_single_element()

This gave the following error:

AttributeError: 'KerasTensor' object has no attribute 'get_single_element'

I then thought the problem was that because the symbolic tensor wasn't of type Dataset, I needed to define the input layer to expect a Dataset. After reading through the documentation and spending ~9 hours messing around, I realized that tf.keras.Input takes an argument called type_spec, which allows the user to specify exactly the type of symbolic tensor to create (I think - I'm still a little shaky on understanding exactly what's going on and I'm more than a little sleep deprived, which isn't helping). As it turns out there's a way to generate the type_spec from the dataset itself, so I did that to make sure that I wasn't making a mistake in generating it.

input = tf.keras.Input(tensor = dataset_train)
training_dataset, label = input.get_single_element()

Which gives the following error:

AttributeError: 'BatchDataset' object has no attribute 'dtype'

I'm not really sure why I get this error, but I tried to circumvent it by explicitly defining the type_spec in the Input layer

input = tf.keras.Input(type_spec: tf.data.DatasetSpec.from_value(dataset_train))
training_dataset, label = input.get_single_element()

Which gives the following error:

ValueError: KerasTensor only supports TypeSpecs that have a shape field; got DatasetSpec, which does not have a shape.

I also had tried to make the DatasetSpec manually instead of generating it using .from_value earlier and had gotten the same error. I thought then it was just because I was messing it up, but now that I've gotten this error from .from_value, I'm beginning to suspect that this line of solutions won't work because DatasetSpec implicitly is missing a shape. I might also be confused, because performing dataset_train.element_spec clearly reveals that the dataset does have a shape, so I'm not sure why Tensorflow can't infer from it.

Any help in furthering either of those non-functional solutions so that I can explicitly access the training_data and label separately from an input Dataset inside the Functional API would be much appreciated!

1

There are 1 best solutions below

0
On

You don't need to split your dataset into x_train and y_train datasets! Keras will do this for you. All you need need to do is reformat your (input, output_1_label, output_2_label) tuple to (input, {'named_output_1': output_1_label, 'named_output_2': output_2_label). I did my reformatting inside a tf.data.Dataset.map call.

I assume you wanted to split your tuple to call model.fit(x=x_train, y={'out1' y_train_1, 'out2': y_train_2}, ...)

Instead, with your input/output tuple you just call model.fit(x_train_tuples ...)