I am running a simple test on Colab and Cloud TPU VM. Software version: tpu-vm-tf-2.14.1.
Why is the virtual machine so slow?
import numpy as np
import tensorflow as tf
import time
from datetime import datetime
import os
resolver = tf.distribute.cluster_resolver.TPUClusterResolver(tpu='') # Cloud: tpu='local'
tf.config.experimental_connect_to_cluster(resolver)
tf.tpu.experimental.initialize_tpu_system(resolver)
print("All devices: ", tf.config.list_logical_devices('TPU'))
n = 10000
A = tf.constant(np.random.randint(10, size=(n, n)).astype('i'))
B = tf.constant(np.random.randint(10, size=(n, n)).astype('i'))
cpu_i = time.time()
for i in range(10):
C = tf.matmul(A, B)
cpu_f = time.time()
tpu_i = time.time()
print("=> ",datetime.now())
with tf.device('/TPU:0'):
for i in range(10):
print("=> ",datetime.now())
C = tf.matmul(A, B)
print("=> ",datetime.now())
tpu_f = time.time()
print('CPU time: ', cpu_f - cpu_i)
print('TPU time: ', tpu_f - tpu_i)
Results - Google Colab:
=> 2024-01-17 20:16:02.877197
=> 2024-01-17 20:16:02.881194
=> 2024-01-17 20:16:02.882695
=> 2024-01-17 20:16:02.883767
=> 2024-01-17 20:16:02.884937
=> 2024-01-17 20:16:02.885979
=> 2024-01-17 20:16:02.887130
=> 2024-01-17 20:16:02.888226
=> 2024-01-17 20:16:02.889353
=> 2024-01-17 20:16:02.890635
=> 2024-01-17 20:16:02.891744
=> 2024-01-17 20:16:02.893052
CPU time: 0.005396604537963867
TPU time: 0.016071081161499023
Results - Cloud TPU VM:
=> 2024-01-17 20:40:30.312673
=> 2024-01-17 20:40:30.313365
=> 2024-01-17 20:40:33.789081
=> 2024-01-17 20:40:34.694601
=> 2024-01-17 20:40:35.600071
=> 2024-01-17 20:40:36.505534
=> 2024-01-17 20:40:37.410991
=> 2024-01-17 20:40:38.316408
=> 2024-01-17 20:40:39.221870
=> 2024-01-17 20:40:40.127342
=> 2024-01-17 20:40:41.032802
=> 2024-01-17 20:40:41.938277
CPU time: 31.648191690444946
TPU time: 11.625694751739502
- Thanks a lot!