I have been trying to combine image (chest x-ray) and tabular data (age, sex, BMI, etc) for a binary prediction model (disease: 0 or 1). I have the 2D-CNN using sequential, but been having difficulty incorporating the tabular data. I have explored some resources online (like https://machinelearningmastery.com/keras-functional-api-deep-learning/) but most are outdated - I haven't found any good examples. Would the Functional API be the best way? Can you please guide on how I can approach this specific to my data below?
import tensorflow as tf
from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, Flatten, Dense, concatenate
from tensorflow.keras.models import Model
label_encoder = LabelEncoder()
df['disease'] = label_encoder.fit_transform(df['disease']).astype(str)
# Split the DataFrame into training and testing sets
train_data, test_data = train_test_split(df, test_size=0.2, random_state=42, stratify=df['disease'])
# Image data generator
image_datagen = ImageDataGenerator(rescale=1/255)
# Train image generator
train_image_generator = image_datagen.flow_from_dataframe(
train_data,
x_col='filename',
y_col='disease',
target_size=(224, 224),
batch_size=32,
class_mode='binary',
dtype='float32'
)
# Test image generator
test_image_generator = image_datagen.flow_from_dataframe(
test_data,
x_col='filename',
y_col='disease',
target_size=(224, 224),
batch_size=32,
class_mode='binary',
dtype='float32'
)
# Convert 'Sex' feature to numeric values
train_data['Sex'] = train_data['Sex'].map({'F': 0, 'M': 1})
test_data['Sex'] = test_data['Sex'].map({'F': 0, 'M': 1})
# Tabular features for training
train_tabular_features = train_data[['age', 'Sex', 'Height', 'Weight']].values.astype(float)
# Tabular features for testing
test_tabular_features = test_data[['age', 'Sex', 'Height', 'Weight']].values.astype(float)
# Define the image input layer
img_input = Input(shape=(224, 224, 3), name='image_input')
x1 = Conv2D(16, 3, padding='same', activation='relu')(img_input)
x1 = MaxPooling2D()(x1)
x1 = Conv2D(32, 3, padding='same', activation='relu')(x1)
x1 = MaxPooling2D()(x1)
x1 = Flatten()(x1)
# Define the tabular input layer
tabular_input = Input(shape=(4,), name='tabular_input')
x2 = Dense(16, activation='relu')(tabular_input)
x2 = Dense(32, activation='relu')(x2)
x2 = Flatten()(x2)
# Concatenate the outputs of image and tabular branches
concatenated = concatenate([x1, x2])
# Common layers for combined features
x = Dense(128, activation='relu')(concatenated)
output_layer = Dense(1, activation='sigmoid', name='output')(x)
# Create the model
model = Model(inputs=[img_input, tabular_input], outputs=output_layer)
# Compile the model
model.compile(
optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy']
)
history = model.fit(
x={
'image_input': train_image_generator,
'tabular_input': train_tabular_features
},
y=train_labels,
epochs=20,
validation_data=(
{
'image_input': test_image_generator,
'tabular_input': test_tabular_features
},
test_labels
)
)
I have been getting various errors in the way I approached it mainly the below error. I believe my approach is wrong, and thus looking for any resources or guides.