We have a java application running on EKS as a container in the pod spawning 40 threads and K8s resources set as
requests:
cpu: "1"
memory: 1Gi
limits:
cpu: "2"
memory: 2Gi
Each thread(say parent-thread) basically reads a file typically of 50MB size from S3, loops thru the records in the file, processes each record, creates a batch of 5MB and spawns another thread(say child-thread) that posts the payload(batch) thru async rest api call to another system that ingests the payload.
Based on the observations, it takes close to 6-7 secs for one batch to be processed and posted. And the whole s3 file is being processed in ~65-70 secs. This should also be the time the parent-thread is active for.
Based on my understanding of K8s requests and limits, since we have a cpu limit of 2 cores or 2000 millicores, when equated to cpu_period of 100ms cycles, the application uses 2 cores every second. Since limits are enforced on the whole container, all 40 threads would be sharing this 200 ms, or to say each thread runs for only 5 ms and gets throttled. Please correct me if my understanding is wrong here
To avoid cpu throttling
- should we have our application run only 2 threads?
- OR since, each thread is running for a total of 60-70 secs, should the application run same number of threads as cores?
- We couldn't figure out why we have set 2 core limit per container, but given that the whole EKS cluster majorly runs only our application(as a container within a pod in a node, there are other sidecar containers too), is there an advantage to have say 8 pods running each with
limit=2 coreson a 16 vCPU node than running 2 pods running each withlimit=8 cores?
Thanks