Kubernetes KEDA scaledjob is not responding

Question

Kubernetes KEDA scaledjob is not responding

166 Views Asked by Vowneee At 28 July 2025 at 15:53

We are using azuredevops agent configured in AKS cluster with the Keda scaledjobs. The AKS node pool sku is Standard_E8ds_v5 (1 instance) and we are using persistent volume mounted on azure disk .

the scaledJob property is as below.

apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
  annotations:
  name: azdevops-scaledjob
  namespace: ado
spec:
  failedJobsHistoryLimit: 5
  jobTargetRef:
    template:
      spec:
        affinity:
          nodeAffinity:
            preferredDuringSchedulingIgnoredDuringExecution:
            - preference:
                matchExpressions:
                - key: kubernetes.azure.com/mode
                  operator: In
                  values:
                  - mypool
                - key: topology.disk.csi.azure.com/zone
                  operator: In
                  values:
                  - westeurope-1
              weight: 2
        containers:
        - env:
          - name: AZP_URL
            value: https://azuredevops.xxxxxxxx/xxxxxxx/organisation
          - name: AZP_TOKEN
            value: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
          - name: AZP_POOL
            value: az-pool
          image: xxxxxxxxxxxxxx.azurecr.io/vsts/dockeragent:xxxxxxxxx
          imagePullPolicy: Always
          name: azdevops-agent-job
          resources:
            limits:
              cpu: 1500m
              memory: 6Gi
            requests:
              cpu: 500m
              memory: 3Gi
          securityContext:
            allowPrivilegeEscalation: true
            privileged: true
          volumeMounts:
          - mountPath: /mnt
            name: ado-cache-storage
        volumes:
        - name: ado-cache-storage
          persistentVolumeClaim:
            claimName: azure-disk-pvc
  maxReplicaCount: 8
  minReplicaCount: 1
  pollingInterval: 30
  successfulJobsHistoryLimit: 5
  triggers:
  - metadata:
      organizationURLFromEnv: AZP_URL
      personalAccessTokenFromEnv: AZP_TOKEN
      poolID: "xxxx"
    type: azure-pipelines

But we noticed a strange behavior as when trying to trigger a build, Error message in the pipeline:

"We stopped hearing from agent azdevops-scaledjob-xxxxxxx. Verify the agent machine is running and has a healthy network connection. Anything that terminates an agent process, starves it for CPU, or blocks its network access can cause this error".

The pipeline will be in hang state and will be continuing without error, but in backend the pod is already in state error. So we have to cancel the pipeline each time when it occures and need to iniatiate a new build, so that the pipeline will be scheduled to a available pod.

On describing the pod which is in error state, we could identify this.

azdevops-scaledjob-6xxxxxxxx-b   0/1     Error     0          27h

Pod has error as below.

Annotations:  <none>
Status:       Failed
Reason:       Evicted
Message:      The node was low on resource: ephemeral-storage. Container azdevops-agent-job was using 23001896Ki, which exceeds its request of 0.

Original Q&A

There are 1 best solutions below

**Thalles Noce** · Answer 1

I have set the safe-to-evict to false, so the AKS won't relocate the pod/job because node downscale.

The drawback here is that AKS can stay with more nodes than needed. So you must ensure the pod/job won't be there forever.

spec:
  jobTargetRef:
    template:
      metadata:
        annotations:
          "cluster-autoscaler.kubernetes.io/safe-to-evict": "false"

Another possibility is to change the node downscale timeout

Terraform code

  auto_scaler_profile {
    scale_down_unneeded = "90m"
  }

Kubernetes KEDA scaledjob is not responding

There are 1 best solutions below

Related Questions in KUBERNETES

Related Questions in AZURE-AKS

Related Questions in KEDA

Related Questions in KEDA-SCALEDJOB

Trending Questions

Popular # Hahtags

Popular Questions