In the Keras tutorial it shows how to embed a text field and pass it to TFDF in this tutorial.
sentence_encoder_layer = hub.KerasLayer(
"https://tfhub.dev/google/universal-sentence-encoder/4"
)
inputs = layers.Input(shape=(), dtype=tf.string)
outputs = sentence_encoder_layer(inputs)
preprocessor = keras.Model(inputs=inputs, outputs=outputs)
model_1 = tfdf.keras.GradientBoostedTreesModel(preprocessing=preprocessor)
This example is for a dataset with one text field as input. What if i have 3 or 4 text fields that needs embedding? I tried concatenating multiple sentence_encode_layer and pass it to preprocessor. But its failing.
Here is the code Im trying
FEATURES = ['feat1', 'feat2', 'feat3', 'feat4']
def create_model_inputs():
inputs = {}
for feature_name in FEATURES:
inputs[feature_name] = layers.Input(
name=feature_name, shape=(), dtype=tf.string
)
return inputs
def create_encoder_inputs(inputs):
encoded_features = []
for feature_name in inputs:
layer = sentence_encoder_layer(inputs[feature_name])
encoded_features.append(layer)
return encoded_features
preprocessor = keras.Model(inputs=inputs, outputs=outputs)
model_1 = tfdf.keras.GradientBoostedTreesModel(preprocessing=preprocessor)
I got the following error while trying this.
ValueError: Layer "model" expects 4 input(s), but it received 1 input tensors. Inputs received: [<tf.Tensor 'data:0' shape=(None, 4) dtype=string>]
I tried the above code and also tried concatenating all four layers before returning using layer.concatenate
, but still getting errors.
The issue encountered is due to the incorrect usage of the inputs and outputs variables while creating the preprocessor model. The correct approach is to pass the list of inputs to the model and concatenate the encoded features to create a single tensor output.
My output: