Add Security groups in Amazon SageMaker for distributed training jobs

363 Views Asked by Philipp Schmid At 09 September 2022 at 14:20

We would like to enforce specific security groups to be set on the SageMaker training jobs (XGBoost in script mode). However, distributed training, in this case, won’t work out of the box, since the containers need to communicate with each other. What are the minimum inbound/outbound rules (ports) that we need to specify for training jobs so that they can communicate?

Original Q&A

There are 1 best solutions below

Kyle Gallatin On 10 September 2022 at 18:36 BEST ANSWER

setting up training in VPC including specifying security groups is documented here: https://docs.aws.amazon.com/sagemaker/latest/dg/train-vpc.html#train-vpc-groups

Normally you would allow all communication between the training nodes. To do this you specify the security group source and destination to the name of the security group itself, and allow all IPv4 traffic. If you want to figure out what ports are used, you could: 1/ define the permissive security group. 2/ Turn on VPC flow logs 3/ run training. 4/ examine VPC Flow logs 5/ update the security group only to the required ports.

I must say restricting communication between the training nodes might be an extreme, so I would challenge the customer why it's really needed, as all nodes carry the same job, have the same IAM role, and are transiate by nature.

Add Security groups in Amazon SageMaker for distributed training jobs

There are 1 best solutions below

Related Questions in AMAZON-WEB-SERVICES

Related Questions in XGBOOST

Related Questions in AMAZON-SAGEMAKER

Related Questions in DISTRIBUTED-TRAINING

Related Questions in AMZ-SAGEMAKER-DISTRIBUTED-TRAINING

Trending Questions

Popular # Hahtags

Popular Questions