To preface my post, my background is in electrical engineering and hardware design. I am brand new to machine learning and I am learning by doing. I am developing some RNN models to be run on the Coral Dev Board Micro (FreeRTOS). I am having some issues with getting the model to run on the hardware. The model seems to be importing fine, but the input_tensor information is showing that the datatypes are Float32 because the "->bytes" call is returning four times what I expected for the input shape. I am looking for ways to debug each step of the process on the dev board without a separate hardware debugger. I tried to simply printf the raw model data but that seems to lock up the program and nothing prints. For context, here is my first model that I am testing:
model = keras.Sequential()
model.add(layers.Input(shape=(1,10), batch_size=1))
model.add(normalizationLayer)
model.add(layers.LSTM(128, return_sequences=True))
model.add(layers.LSTM(128, return_sequences=True))
model.add(layers.Dense(3))
model.summary()
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.001), loss='', metrics=['mean_squared_error', 'mean_absolute_error'])
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
normalization (Normalizati (1, 1, 10) 21
on)
lstm (LSTM) (1, 1, 128) 71168
lstm_1 (LSTM) (1, 1, 128) 131584
dense (Dense) (1, 1, 3) 387
=================================================================
Total params: 203160 (793.60 KB)
Trainable params: 203139 (793.51 KB)
Non-trainable params: 21 (88.00 Byte)
_________________________________________________________________
early_stopping = EarlyStopping(
monitor='val_mean_absolute_error',
patience=10,
min_delta=0.0001,
mode='min',
verbose=1,
restore_best_weights=True
)
history = model.fit(X_train, Y_train, epochs=10000, batch_size=1, validation_data=(X_test, Y_test), callbacks=[early_stopping], verbose=1)
# Save the TensorFlow model
model.save("SavedModel")
2024-03-12 00:02:11.836962: I tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:442] Loaded cuDNN version 8600
2024-03-12 00:02:11.950956: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f2d24007160 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2024-03-12 00:02:11.951005: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): NVIDIA A100-SXM4-80GB, Compute Capability 8.0
2024-03-12 00:02:11.959259: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:269] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2024-03-12 00:02:12.163354: I ./tensorflow/compiler/jit/device_compiler.h:186] Compiled cluster using XLA! This line is logged at most once for the lifetime of the process.
Epoch 1/10000
7920/7920 [==============================] - 47s 5ms/step - loss: 5.9866e-06 - mean_squared_error: 5.9866e-06 - mean_absolute_error: 0.0017 - val_loss: 2.2417e-06 - val_mean_squared_error: 2.2417e-06 - val_mean_absolute_error: 0.0012
Epoch 2/10000
7920/7920 [==============================] - 43s 5ms/step - loss: 1.9079e-06 - mean_squared_error: 1.9079e-06 - mean_absolute_error: 0.0011 - val_loss: 1.3619e-06 - val_mean_squared_error: 1.3619e-06 - val_mean_absolute_error: 9.0656e-04
Epoch 3/10000
7920/7920 [==============================] - 41s 5ms/step - loss: 1.4972e-06 - mean_squared_error: 1.4972e-06 - mean_absolute_error: 9.5085e-04 - val_loss: 1.8010e-06 - val_mean_squared_error: 1.8010e-06 - val_mean_absolute_error: 0.0010
Epoch 4/10000
7920/7920 [==============================] - 39s 5ms/step - loss: 1.2541e-06 - mean_squared_error: 1.2541e-06 - mean_absolute_error: 8.7918e-04 - val_loss: 1.2550e-06 - val_mean_squared_error: 1.2550e-06 - val_mean_absolute_error: 8.8195e-04
Epoch 5/10000
7920/7920 [==============================] - 40s 5ms/step - loss: 1.1424e-06 - mean_squared_error: 1.1424e-06 - mean_absolute_error: 8.3625e-04 - val_loss: 1.5958e-06 - val_mean_squared_error: 1.5958e-06 - val_mean_absolute_error: 0.0010
Epoch 6/10000
7920/7920 [==============================] - 39s 5ms/step - loss: 1.0729e-06 - mean_squared_error: 1.0729e-06 - mean_absolute_error: 8.1175e-04 - val_loss: 1.0057e-06 - val_mean_squared_error: 1.0057e-06 - val_mean_absolute_error: 8.1091e-04
Epoch 7/10000
7920/7920 [==============================] - 41s 5ms/step - loss: 9.8139e-07 - mean_squared_error: 9.8139e-07 - mean_absolute_error: 7.7363e-04 - val_loss: 9.2954e-07 - val_mean_squared_error: 9.2954e-07 - val_mean_absolute_error: 7.5575e-04
Epoch 8/10000
7920/7920 [==============================] - 41s 5ms/step - loss: 9.2890e-07 - mean_squared_error: 9.2890e-07 - mean_absolute_error: 7.5270e-04 - val_loss: 6.0327e-07 - val_mean_squared_error: 6.0327e-07 - val_mean_absolute_error: 5.9428e-04
Epoch 9/10000
7920/7920 [==============================] - 38s 5ms/step - loss: 8.8836e-07 - mean_squared_error: 8.8836e-07 - mean_absolute_error: 7.3885e-04 - val_loss: 8.4576e-07 - val_mean_squared_error: 8.4576e-07 - val_mean_absolute_error: 7.3719e-04
Epoch 10/10000
7920/7920 [==============================] - 37s 5ms/step - loss: 8.5854e-07 - mean_squared_error: 8.5854e-07 - mean_absolute_error: 7.2729e-04 - val_loss: 6.6495e-07 - val_mean_squared_error: 6.6495e-07 - val_mean_absolute_error: 6.2881e-04
Epoch 11/10000
7920/7920 [==============================] - 37s 5ms/step - loss: 8.2996e-07 - mean_squared_error: 8.2996e-07 - mean_absolute_error: 7.1328e-04 - val_loss: 6.7866e-07 - val_mean_squared_error: 6.7866e-07 - val_mean_absolute_error: 6.3946e-04
Epoch 12/10000
7920/7920 [==============================] - 37s 5ms/step - loss: 8.0712e-07 - mean_squared_error: 8.0712e-07 - mean_absolute_error: 7.0461e-04 - val_loss: 7.0616e-07 - val_mean_squared_error: 7.0616e-07 - val_mean_absolute_error: 6.5057e-04
Epoch 13/10000
7920/7920 [==============================] - 37s 5ms/step - loss: 7.9944e-07 - mean_squared_error: 7.9944e-07 - mean_absolute_error: 7.0053e-04 - val_loss: 6.3214e-07 - val_mean_squared_error: 6.3214e-07 - val_mean_absolute_error: 6.2446e-04
Epoch 14/10000
7920/7920 [==============================] - 36s 5ms/step - loss: 7.6923e-07 - mean_squared_error: 7.6923e-07 - mean_absolute_error: 6.8588e-04 - val_loss: 8.0098e-07 - val_mean_squared_error: 8.0098e-07 - val_mean_absolute_error: 7.1634e-04
Epoch 15/10000
7920/7920 [==============================] - 36s 5ms/step - loss: 7.5224e-07 - mean_squared_error: 7.5224e-07 - mean_absolute_error: 6.8033e-04 - val_loss: 6.7610e-07 - val_mean_squared_error: 6.7610e-07 - val_mean_absolute_error: 6.3621e-04
Epoch 16/10000
7920/7920 [==============================] - 36s 5ms/step - loss: 7.2559e-07 - mean_squared_error: 7.2559e-07 - mean_absolute_error: 6.6838e-04 - val_loss: 7.3421e-07 - val_mean_squared_error: 7.3421e-07 - val_mean_absolute_error: 6.6853e-04
Epoch 17/10000
7920/7920 [==============================] - 36s 4ms/step - loss: 7.2084e-07 - mean_squared_error: 7.2084e-07 - mean_absolute_error: 6.6688e-04 - val_loss: 8.3019e-07 - val_mean_squared_error: 8.3019e-07 - val_mean_absolute_error: 7.4651e-04
Epoch 18/10000
7918/7920 [============================>.] - ETA: 0s - loss: 7.1752e-07 - mean_squared_error: 7.1752e-07 - mean_absolute_error: 6.6448e-04Restoring model weights from the end of the best epoch: 8.
7920/7920 [==============================] - 35s 4ms/step - loss: 7.1745e-07 - mean_squared_error: 7.1745e-07 - mean_absolute_error: 6.6445e-04 - val_loss: 9.7731e-07 - val_mean_squared_error: 9.7731e-07 - val_mean_absolute_error: 8.0092e-04
Epoch 18: early stopping
INFO:tensorflow:Assets written to: SavedModel/assets
I've trained and tested the Tensorflow model and it performs exactly as expected. After training the model, I saved it out as a SavedModel, then I attempted to convert it to a fully-integer quantized tflite model (In order to run on the EdgeTPU).
dataset = tf.data.Dataset.from_tensor_slices(X_train)
# Build a representative dataset from dataset for quantization
def representative_dataset():
for data in tf.data.Dataset.from_tensor_slices((X_train)).batch(1).take(100):
yield [tf.dtypes.cast(data, tf.float32)]
# Convert the model to TFLite
converter = tf.lite.TFLiteConverter.from_saved_model('./SavedModel') # path to the SavedModel directory
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.experimental_new_converter=True
converter.representative_dataset = representative_dataset
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8, tf.lite.OpsSet.SELECT_TF_OPS]
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8
tflite_model = converter.convert()
# Save the model.
with open('model.tflite', 'wb') as f:
f.write(tflite_model)
2024-03-12 00:15:48.250830: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:378] Ignored output_format.
2024-03-12 00:15:48.250919: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:381] Ignored drop_control_dependency.
2024-03-12 00:15:48.251240: I tensorflow/cc/saved_model/reader.cc:83] Reading SavedModel from: ./SavedModel
2024-03-12 00:15:48.263954: I tensorflow/cc/saved_model/reader.cc:51] Reading meta graph with tags { serve }
2024-03-12 00:15:48.264021: I tensorflow/cc/saved_model/reader.cc:146] Reading SavedModel debug info (if present) from: ./SavedModel
2024-03-12 00:15:48.294255: I tensorflow/cc/saved_model/loader.cc:233] Restoring SavedModel bundle.
2024-03-12 00:15:48.421845: I tensorflow/cc/saved_model/loader.cc:217] Running initialization op on SavedModel bundle at path: ./SavedModel
2024-03-12 00:15:48.502468: I tensorflow/cc/saved_model/loader.cc:316] SavedModel load for tags { serve }; Status: success: OK. Took 251230 microseconds.
2024-03-12 00:15:48.698098: I tensorflow/compiler/mlir/lite/flatbuffer_export.cc:2245] Estimated count of arithmetic ops: 791 ops, equivalently 395 MACs
fully_quantize: 0, inference_type: 6, input_inference_type: UINT8, output_inference_type: UINT8
2024-03-12 00:15:48.885149: I tensorflow/compiler/mlir/lite/flatbuffer_export.cc:2245] Estimated count of arithmetic ops: 791 ops, equivalently 395 MACs
Is there something that I'm overlooking that is causing the model to import with FLOAT32 rather than UINT8?
When this runs, I would expect the output tensor to be 3 features wide, but it is printing as 12 features wide.
auto* input_tensor = interpreter.input_tensor(0);
auto* output_tensor = interpreter.output_tensor(0);
uint32_t i = 0;
while (true) {
tflite::GetTensorData<float_t>(input_tensor)[0] = test_data[i];
printf("Counter: %lu\r\n", i);
if (interpreter.Invoke() != kTfLiteOk) {
printf("Failed to invoke\r\n");
vTaskSuspend(nullptr);
}
// Print the results
for (uint8_t j = 0; j < output_tensor->bytes; j++){
printf("%d ", tflite::GetTensorData<uint8_t>(output_tensor)[j]);
}
i++;
taskYIELD();
}