How to pass multiple embedded layers into TFDF using Keras?

80 Views Asked by At

In the Keras tutorial it shows how to embed a text field and pass it to TFDF in this tutorial.

sentence_encoder_layer = hub.KerasLayer(
    "https://tfhub.dev/google/universal-sentence-encoder/4"
)

inputs = layers.Input(shape=(), dtype=tf.string)
outputs = sentence_encoder_layer(inputs)
preprocessor = keras.Model(inputs=inputs, outputs=outputs)
model_1 = tfdf.keras.GradientBoostedTreesModel(preprocessing=preprocessor)

This example is for a dataset with one text field as input. What if i have 3 or 4 text fields that needs embedding? I tried concatenating multiple sentence_encode_layer and pass it to preprocessor. But its failing.

Here is the code Im trying

FEATURES = ['feat1', 'feat2', 'feat3', 'feat4']

def create_model_inputs():
    inputs = {}

    for feature_name in FEATURES:
        inputs[feature_name] = layers.Input(
            name=feature_name, shape=(), dtype=tf.string
        )

    return inputs

def create_encoder_inputs(inputs):
    encoded_features = []
    for feature_name in inputs:
        layer = sentence_encoder_layer(inputs[feature_name])
        encoded_features.append(layer)

    return encoded_features

preprocessor = keras.Model(inputs=inputs, outputs=outputs)
model_1 = tfdf.keras.GradientBoostedTreesModel(preprocessing=preprocessor)

I got the following error while trying this. ValueError: Layer "model" expects 4 input(s), but it received 1 input tensors. Inputs received: [<tf.Tensor 'data:0' shape=(None, 4) dtype=string>]

I tried the above code and also tried concatenating all four layers before returning using layer.concatenate, but still getting errors.

1

There are 1 best solutions below

0
On BEST ANSWER

The issue encountered is due to the incorrect usage of the inputs and outputs variables while creating the preprocessor model. The correct approach is to pass the list of inputs to the model and concatenate the encoded features to create a single tensor output.

import tensorflow_hub as hub
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import tensorflow_decision_forests as tfdf

# List of feature names for text fields
FEATURES = ['feat1', 'feat2', 'feat3', 'feat4']

# Create a function to define inputs for all the text fields
def create_model_inputs():
    inputs = {}
    for feature_name in FEATURES:
        inputs[feature_name] = layers.Input(name=feature_name, shape=(), dtype=tf.string)
    return inputs

# Create a function to pass each input through the sentence_encoder_layer
def create_encoder_inputs(inputs):
    encoded_features = []
    for feature_name in FEATURES:
        layer = hub.KerasLayer("https://tfhub.dev/google/universal-sentence-encoder/4")
        encoded_feature = layer(inputs[feature_name])
        encoded_features.append(encoded_feature)
    return encoded_features

# Concatenate the encoded features
def create_preprocessor(inputs):
    encoded_features = create_encoder_inputs(inputs)
    concatenated_features = layers.concatenate(encoded_features)
    return keras.Model(inputs=list(inputs.values()), outputs=concatenated_features)

# Create the preprocessor model
inputs = create_model_inputs()
preprocessor = create_preprocessor(inputs)

# Create the TFDF model
model = tfdf.keras.GradientBoostedTreesModel(preprocessing=preprocessor)

# Build the model
model.build(input_shape=(None,))

model.summary()

My output:

Use /tmp/tmp0sabggaj as temporary training directory
Warning: The model was called directly (i.e. using `model(data)` instead of using `model.predict(data)`) before being trained. The model will only return zeros until trained. The output shape might change after training Tensor("inputs:0", shape=(None,), dtype=float32)
WARNING:absl:The model was called directly (i.e. using `model(data)` instead of using `model.predict(data)`) before being trained. The model will only return zeros until trained. The output shape might change after training Tensor("inputs:0", shape=(None,), dtype=float32)
Model: "gradient_boosted_trees_model_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 model_2 (Functional)        (None, 2048)              1027191296
                                                                 
=================================================================
Total params: 1027191297 (3.83 GB)
Trainable params: 0 (0.00 Byte)
Non-trainable params: 1027191297 (3.83 GB)
_________________________________________________________________