Does Vertex AI Training for Distributed Training Across Multi-Nodes Work With HuggingFace Trainer + Deepspeed?

170 Views Asked by esdy At 17 August 2025 at 19:01

I am wondering if Vertex AI Training can be used for distributed training using Huggingface Trainer and deepspeed? All I have seen are examples with the native torch distribution strategy.

It would be very helpful if someone can tell me

If deepspeed is supported
How to integrate deepspeed when doing multi-node training in Vertex AI

Original Q&A

There are 1 best solutions below

Joevanie On 08 August 2023 at 18:41

You can build a custom training image containing the DeepSpeed training code, push docker image to artifact registry then fine-tune on Vertex AI.

This post on Fine-tuning with DeepSpeed and Vertex AI explains it pretty well.

Does Vertex AI Training for Distributed Training Across Multi-Nodes Work With HuggingFace Trainer + Deepspeed?

There are 1 best solutions below

Related Questions in HUGGINGFACE-TRANSFORMERS

Related Questions in GOOGLE-CLOUD-VERTEX-AI

Related Questions in DEEPSPEED

Trending Questions

Popular # Hahtags

Popular Questions