Why Colab is faster than cloud TPU?

37 Views Asked by At

I am running a simple test on Colab and Cloud TPU VM. Software version: tpu-vm-tf-2.14.1.

Why is the virtual machine so slow?

import numpy as np
import tensorflow as tf
import time
from datetime import datetime
import os

resolver = tf.distribute.cluster_resolver.TPUClusterResolver(tpu='') # Cloud: tpu='local'
tf.config.experimental_connect_to_cluster(resolver)
tf.tpu.experimental.initialize_tpu_system(resolver)
print("All devices: ", tf.config.list_logical_devices('TPU'))

n = 10000
A = tf.constant(np.random.randint(10, size=(n, n)).astype('i'))
B = tf.constant(np.random.randint(10, size=(n, n)).astype('i'))

cpu_i = time.time()
for i in range(10):
    C = tf.matmul(A, B)
cpu_f = time.time()

tpu_i = time.time()
print("=> ",datetime.now())
with tf.device('/TPU:0'):
    for i in range(10):
        print("=> ",datetime.now())
        C = tf.matmul(A, B)
print("=> ",datetime.now())
tpu_f = time.time()

print('CPU time: ', cpu_f - cpu_i)
print('TPU time: ', tpu_f - tpu_i)

Results - Google Colab:

=> 2024-01-17 20:16:02.877197

=> 2024-01-17 20:16:02.881194

=> 2024-01-17 20:16:02.882695

=> 2024-01-17 20:16:02.883767

=> 2024-01-17 20:16:02.884937

=> 2024-01-17 20:16:02.885979

=> 2024-01-17 20:16:02.887130

=> 2024-01-17 20:16:02.888226

=> 2024-01-17 20:16:02.889353

=> 2024-01-17 20:16:02.890635

=> 2024-01-17 20:16:02.891744

=> 2024-01-17 20:16:02.893052

CPU time: 0.005396604537963867

TPU time: 0.016071081161499023

Results - Cloud TPU VM:

=> 2024-01-17 20:40:30.312673

=> 2024-01-17 20:40:30.313365

=> 2024-01-17 20:40:33.789081

=> 2024-01-17 20:40:34.694601

=> 2024-01-17 20:40:35.600071

=> 2024-01-17 20:40:36.505534

=> 2024-01-17 20:40:37.410991

=> 2024-01-17 20:40:38.316408

=> 2024-01-17 20:40:39.221870

=> 2024-01-17 20:40:40.127342

=> 2024-01-17 20:40:41.032802

=> 2024-01-17 20:40:41.938277

CPU time: 31.648191690444946

TPU time: 11.625694751739502

- Thanks a lot!

0

There are 0 best solutions below