Kaggle GPU and TPU does not work, executes with CPU

32 Views Asked by At

My goal is to make an LSTM autoencoder using tensorflow.keras. For this I want to use the kaggle GPUs/TPUs, however, even though I have the cunta verified and select an accelerator it runs with the CPU and is super slow.

Because I have found kaggle documentation on Keras, I have made an unsuccessful implementation where I try to train the model.

data= np.load("/kaggle/working/dataStack.npy")

# detect and init the TPU
tpu = tf.distribute.cluster_resolver.TPUClusterResolver()

# instantiate a distribution strategy
tf.tpu.experimental.initialize_tpu_system(tpu)
tpu_strategy = tf.distribute.TPUStrategy(tpu)

with tpu_strategy.scope():
        # use 'maxlen' as the number of time steps.
        n_steps = maxlen
        n_features = 3 # position, torque, thrust

    # Define the model
    model = Sequential()

    # Input layer
    model.add(Input(shape=(n_steps, n_features)))

    # LSTM layer
    model.add(LSTM(128, activation='relu'))

    # Encoder layer
    model.add(Dense(64, activation='relu', kernel_initializer='he_uniform'))

    # Repeater layer to prepare encoder output for decoding
    model.add(RepeatVector(n_steps))

    # Decoder layer
    model.add(LSTM(128, activation='relu', return_sequences=True))

    # Output layer
    model.add(TimeDistributed(Dense(n_features)))

    model.compile(optimizer='adam', loss='mse', steps_per_execution=32)

y = data

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

BATCH_SIZE = 16 * tpu_strategy.num_replicas_in_sync

model.fit(X_train, y_train, epochs=10, batch_size=BATCH_SIZE, validation_data=(X_test, y_test))

When the model is fitting, it runs using the CPU. Those are the outputs

INFO:tensorflow:Deallocate tpu buffers before initializing tpu system.
INFO:tensorflow:Initializing the TPU system: local
2024-03-26 08:32:59.510852: E external/local_xla/xla/stream_executor/stream_executor_internal.h:177] SetPriority unimplemented for this stream.
2024-03-26 08:32:59.510965: E external/local_xla/xla/stream_executor/stream_executor_internal.h:177] SetPriority unimplemented for this stream.
2024-03-26 08:32:59.511062: E external/local_xla/xla/stream_executor/stream_executor_internal.h:177] SetPriority unimplemented for this stream.
....(continues)
INFO:tensorflow:Finished initializing TPU system.
INFO:tensorflow:Found TPU system:
INFO:tensorflow:*** Num TPU Cores: 8
INFO:tensorflow:*** Num TPU Workers: 1
INFO:tensorflow:*** Num TPU Cores Per Worker: 8
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:CPU:0, CPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:TPU:0, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:TPU:1, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:TPU:2, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:TPU:3, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:TPU:4, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:TPU:5, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:TPU:6, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:TPU:7, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:TPU_SYSTEM:0, TPU_SYSTEM, 0, 0)

WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1711441984.528757      13 device_compiler.h:186] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.
add Codeadd Markdown

Epoch 1/10

I would like to know if you could help me since I am a newbie with Kaggle and Tensorflow.

0

There are 0 best solutions below