I need to run 500 jobs, but they keep getting stuck in Runnable. When my jobs do start, they run to completion, so there is nothing wrong with their configuration. Maybe it's a us-east-1 capacity issue or maybe it's the service quotas tied to my account.
What service quotas apply in this scenario? Below are default quotas as candidate suspects. I have submitted requests to increase each of these.
| Service | Quota name | Applied account-level quota value | AWS default quota value |
|---|---|---|---|
| EC2 | Running On-Demand Standard (A, C, D, H, I, M, R, T, Z) instances | 32 CPUs | 5 |
| EC2 | All Standard (A, C, D, H, I, M, R, T, Z) Spot Instance Requests | 32 CPUs | 5 |
| EC2 | New Reserved Instances per month | 50 | 20 |
| EC2 | EC2-VPC Elastic IPs | 5 | 5 |
(Batch jobs require a public IP to talk to ECR)
Configuration:
- AWS Batch - Compute Environment - Max vCPUs: 5000
- AWS Batch - Job Definition - Fargate - vCPUs: 4
- AWS Batch - Job Definition - Fargate - Memory: 8GB
- AWS Batch - Job Definition - Fargate - Ephemeral Storage: 100GB

The quotas in question were:
As shown by the "utilization" on the right
It is infuriating that the services enforce quotas, but don't inform the user when they are being enforced.