AWS DataSync from EFS to S3 - connection timed out

1.9k Views Asked by At

I'm trying to create a DataSync task to copy files from EFS to S3, and for this I'm using Terraform. From reading the documentation, it looks like I dont need DataSync agent to do this. Following the guide at https://ystoneman.medium.com/serverless-datasync-from-efs-to-s3-6cb3a7ab85f7, I have created the following

  • Security Group. I created this security group, and assigned it to the EC2 config for the datasync source location
resource "aws_security_group" "sg-datasync" { 
  name = "datasync"
  vpc_id = "vpc-sampleVPC"
}
  • DataSync Source Location (EFS)
resource "aws_datasync_location_efs" "source_efs" {
  efs_file_system_arn =  "arn:aws:elasticfilesystem:ap-southeast-2:XXXXX:file-system/fs-6b3f3753"
  ec2_config {
    security_group_arns = [aws_security_group.sg-datasync.arn]
    subnet_arn          = "arn:aws:ec2:ap-southeast-2:XXXXX:subnet/subnet-09d919d3b76e9c7f0"
  }
}
  • DataSync Target Location (S3)
resource "aws_datasync_location_s3" "target_s3" {
  s3_bucket_arn = local.s3_arn
  subdirectory  = "/some_target_folder"

  s3_config {
    bucket_access_role_arn = local.s3_bucket_role_arn
  }
}
  • DataSync Task
resource "aws_datasync_task" "sampleTask" {
  destination_location_arn = aws_datasync_location_s3.target_s3.arn
  name                     = "sampleTask"
  source_location_arn      = aws_datasync_location_efs.source_efs.arn

  options {
    bytes_per_second = -1
  }
}

In addition to this, I have created more security related stuffs:

  • Security Group rule to allow inbound NFS access from DataSync source location security group (based on what the article says "On your EFS file system mount target’s security group, allow inbound access on port 2049 from the DataSync source location’s security group.")
resource "aws_security_group_rule" "datasync_to_efs" { 
  type                     = "ingress"
  from_port                = 2049
  to_port                  = 2049
  protocol                 = "tcp"
  source_security_group_id = aws_security_group.sg-datasync.id 
  security_group_id        = "sg-049fd2c6708c42c20"
}
  • Security Group rule to allow all outbound access on all ports to EFS file system's mount target's security group. Again, this is based on the article "On your DataSync source location’s security group, allow all outbound access on all ports to your EFS file system’s mount target’s security group"
resource "aws_security_group_rule" "egress_datasync_to_efs" {
  type                     = "egress"
  from_port                = 0
  to_port                  = 65535
  protocol                 = "tcp"
  source_security_group_id = "sg-049fd2c6708c42c20"
  security_group_id        = aws_security_group.sg-datasync.id
}

Also note that 'sg-049fd2c6708c42c20' is the EFS file system's mount target security group. At least that is what I think it is, based on the screenshot below (this is taken from the EFS network configuration for fs-6b3f3753):

EFS Network configuration

So with these, I can see the datasync task and locations created successfully. However, when I tried to run the task, I'm getting connection timed out:

"Task failed to access location loc-0bdebcc42541f73e4: x40016: mount.nfs: Connection timed out"

FYI: loc-0bdebcc42541f73e4 is the source location, and I can see from console, that it has the following details:

  • Location ID: loc-0bdebcc42541f73e4
  • Type: Amazon EFS file system
  • Path: /
  • File share: fs-6b3f3753
  • Subnet: subnet-09d919d3b76e9c7f0
  • Security groups: sg-0bb0d7ddb3dec8ca6

sg-0bb0d7ddb3dec8ca6 is the security group 'sg-datasync'. From console, it has no inbound, but it has one outbound rule:

  • IP version: -
  • Type: All TCP
  • Protocol: TCP
  • Port range: 0-65535
  • Destination: sg-049fd2c6708c42c20

Looking at https://docs.aws.amazon.com/efs/latest/ug/troubleshooting-efs-mounting.html#mount-hangs-fails-timeout, it seems that either I didnt set the EC2 instance or the mount target security groups configuration correctly. My question are:

  1. Where is the EC2 instance configuration on my terraform above? Is it the aws_datasync_location_efs.source_efs.ec2_config ? My guess is.. AWS will spawn off an EC2 instance temporarily to access the EFS, and it is configured using this block ?
  2. Assuming no. 1 is correct, that EC2 has been configured using a) security group 'sg-datasync' b) the 'datasync_to_efs' rule has configured the mount target security group (sg-049fd2c6708c42c20) to allow inbound NFS access from the EC2 security group 'sg-datasync'.

Any help / pointer is very much appreciated!

1

There are 1 best solutions below

0
On
  1. AWS doesn't seem to have any documentation on whether they spin up an EC2 on the backend. These settings are related to EC2 instances though so you can probably safely think there is an instance somewhere on the backend that uses this config to do the sync.

  2. Yes it follows that Datasync will use the specified security group and SG rules you specified in order to access your EFS file system.

In most guides on this it's recommended to actually have 2 different Security groups.

  • One for the EFS mount target (which has the "datasync_to_efs" rule) it should look something like this:
    resource "aws_security_group" "efs" {
      name        = "efs"
      description = "Allow traffic from Datasync to EFS"
      vpc_id      = aws_vpc.vpc.id
    }
    resource "aws_security_group_rule" "datasync_ingress" {
      security_group_id        = aws_security_group.efs.id
      description              = "Allow traffic from Datasync to EFS"
      from_port                = 2049
      to_port                  = 2049
      protocol                 = "tcp"
      type                     = "ingress"
      source_security_group_id = aws_security_group.datasync.id
    }
    
  • One for the Datasync location (allowing all egress)
    resource "aws_security_group" "datasync" {
      name        = "datasync"
      description = "Allow all egress traffic from Datasync"
      vpc_id      = aws_vpc.vpc.id
    }
    resource "aws_security_group_rule" "datasync_egress" {
      security_group_id        = aws_security_group.datasync.id
      description              = "Allow all egress traffic from Datasync"
      from_port                = 0
      to_port                  = 0
      protocol                 = "-1"
      type                     = "ingress"
      source_security_group_id = aws_security_group.datasync.id
    }
    

As heathesh pointed out in their comment, you should also check that the EFS policy and Datasync role allow mounting as well. If you've already solved this issue, please share the solution!