I recently started using TPUv3-8 VMs to train language models and haven't had any issues with VMs crashing or the like. However, one of my TPU VMs seems to now have broken out of nowhere and I am completely lost.

When trying to ssh to the VM, I get the following error message:

ERROR: (gcloud.alpha.compute.tpus.tpu-vm.ssh) INVALID_ARGUMENT: Cloud TPU received an invalid argument. The "GuestAttributes" value "" was not found. [EID: 0xdffd54714f63b861]

I also cannot start or stop (only delete) the VM from https://console.cloud.google.com/compute/tpus because its status is "unknown".

Is there any way I can get the VM running again?

1

There are 1 best solutions below

0
On

This issue can be transient and go away if you retry a couple of times. Do you continue to have this issue?