I am using KEDA for auto-scaling my ML Workload pods. The pods auto-scales based on the number of queue items, when the queue scales down even when the pods are still processing queue workloads. I have set terminationGracePeriodSeconds but either too high or too low for our workload.
Read about SIGTERM signal and prestop, but could not find any sample that can solve this problem using KEDA queue helm chart
We fixed the issue by increasing the
terminationGracePeriodSeconds
in deployment.YAML. This might not be the right fix, but it worked for us. Hope it helps someone who has the same issue.If you are using the function app, also decrease the batch size, so the pod is not caught up processing for a longer time.