CannotPullContainerError: failed to extract layer

1.6k Views Asked by At

I'm trying to run a task on a windows container in fargate mode on aws

The container is a .net console application (Fullframework 4.5)

This is the task definition generated programmatically by SDK

var taskResponse = await ecsClient.RegisterTaskDefinitionAsync(new Amazon.ECS.Model.RegisterTaskDefinitionRequest()
            {
                RequiresCompatibilities = new List<string>() { "FARGATE" },
                TaskRoleArn = TASK_ROLE_ARN,
                ExecutionRoleArn = EXECUTION_ROLE_ARN,
                Cpu = CONTAINER_CPU.ToString(),
                Memory = CONTAINER_MEMORY.ToString(),
                NetworkMode = NetworkMode.Awsvpc,
                Family = "netfullframework45consoleapp-task-definition",
                EphemeralStorage = new EphemeralStorage() { SizeInGiB = EPHEMERAL_STORAGE_SIZE_GIB },
                ContainerDefinitions = new List<Amazon.ECS.Model.ContainerDefinition>()
                {
                     new Amazon.ECS.Model.ContainerDefinition()
                     {
                        Name = "netfullframework45consoleapp-task-definition",
                        Image = "XXXXXXXXXX.dkr.ecr.eu-west-1.amazonaws.com/netfullframework45consoleapp:latest",
                        Cpu = CONTAINER_CPU,
                        Memory = CONTAINER_MEMORY,
                        Essential = true
                        
            //I REMOVED THE LOG DEFINITION TO SIMPLIFY THE PROBLEM
                        //,
                        //LogConfiguration = new Amazon.ECS.Model.LogConfiguration()
                        //{
                
                        //   LogDriver = LogDriver.Awslogs,
                        //   Options = new Dictionary<string, string>()
                        //   {
                        //      { "awslogs-create-group", "true"},  
                        //      { "awslogs-group", $"/ecs/{TASK_DEFINITION_NAME}" },
                        //      { "awslogs-region", AWS_REGION },
                        //      { "awslogs-stream-prefix", $"{TASK_DEFINITION_NAME}" }
                        //   }
                        //}
                     }
                }
            });

these are the role policies contained used by the task AmazonECSTaskExecutionRolePolicy

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ecr:GetAuthorizationToken",
                "ecr:BatchCheckLayerAvailability",
                "ecr:GetDownloadUrlForLayer",
                "ecr:BatchGetImage",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": "*"
        }
    ]
}

i got this error when lunch the task

CannotPullContainerError: ref pull has been retried 1 time(s): failed to extract layer sha256:fe48cee89971abac42eedb9110b61867659df00fc5b0b90dd91d6e19f704d935: link /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/212/fs/Files/ProgramData/Microsoft/Event Viewer/Views/ServerRoles/RemoteDesktop.Events.xml /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/212/fs/Files/Windows/Microsoft.NET/assembly/GAC_64/Microsoft.Windows.ServerManager.RDSPlugin/v4.0_10.0.0.0__31bf3856ad364e35/RemoteDesktop.Events.xml: no such file or directory: unknown

some search drived me here: https://aws.amazon.com/it/premiumsupport/knowledge-center/ecs-pull-container-api-error-ecr/

the point 1 says that if i run the task on the private subnet (like i'm doing) i need a NAT with related route to garantee the communication towards the ECR, but note that in my infrastructure i've a VPC Endpoint to the ECR....

so the first question is: is a VPC Endpoint sufficent to garantee the comunication from the container to the container images registry(ECR)? or i need necessarily to implement what the point 1 say (NAT and route on the route table) or eventually run the task on a public subnet?

Can be the error related to the missing communication towards the ECR, or could be a missing policy problem?

1

There are 1 best solutions below

0
On

Make sure your VPC endpoint is configured correctly. Note that

"Amazon ECS tasks hosted on Fargate using platform version 1.4.0 or later require both the com.amazonaws.region.ecr.dkr and com.amazonaws.region.ecr.api Amazon ECR VPC endpoints as well as the Amazon S3 gateway endpoint to take advantage of this feature."

See https://docs.aws.amazon.com/AmazonECR/latest/userguide/vpc-endpoints.html for more information

In the first paragraph of the page I linked: "You don't need an internet gateway, a NAT device, or a virtual private gateway."