Distributed training example for Temporal Fusion Transformer in SageMaker

159 Views Asked by At

We’re training a big Temporal Fusion Transformer using PyTorch.

We’re looking into using Distributed Training and accelerate training jobs with SageMaker.

Does anyone have any examples of this? Any pattern you can recommend?

1

There are 1 best solutions below

0
Arun Lokanatha On

Although there is no direct example for the above mentioned model, you should be able to follow the below documentation for PL

https://docs.aws.amazon.com/sagemaker/latest/dg/data-parallel-modify-sdp-pt-lightning.html

Refer below example for a full example of using SageMaker DDP and Pytorch Lightning.

https://github.com/aws-samples/sagemaker-distributed-training-workshop/blob/main/1_data_parallel/PyTorch%20Lightning%20on%20SageMaker.ipynb