Airflow KubernetesExecutor / Failed to adopt pod

628 Views Asked by At

We are using the airflow kubernetes executor and for the most part it works great. While normally pods get terminated and disappear after a completed task, sometimes "something" happens and these completed pods end up sticking around forever. Or until we manually kill them.

When I look in our logs, I see entry after entry like the following for these stuck pods:

Failed to adopt pod ap127331workitemhistorystreamfilifilisit.5e10fd80bbda40df8e7af5c21da88fea. Reason: (422)
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Pod \"ap127331workitemhistorystreamfilifilisit.5e10fd80bbda40df8e7af5c21da88fea\" is invalid: spec: Forbidden: pod updates may not change fields other than `spec.containers[*].image`, `spec.initContainers[*].image`, `spec.activeDeadlineSeconds` or `spec.tolerations` (only additions to existing tolerations)

I can't seem to find any rhyme or reason why some pods work fine and others get stuck. This is happening randomly with all DAGs and tasks.

Thanks so much for any help.

1

There are 1 best solutions below

0
On

The service account assigned to your executor needs patch permission. I updated the role attached to the service account my Kubernetes executor pods execute as, to add permission to "patch":

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: airflow-executor
namespace: airflow2
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: [
    "create",
    "delete",
    "get",
    "list",
    "patch",
    "watch"
]

---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: airflow-executor
namespace: airflow2
subjects:
- kind: ServiceAccount
name: airflow-sa
apiGroup: ""
roleRef:
kind: Role
name: airflow-executor
apiGroup: rbac.authorization.k8s.io

This allowed airflow jobs to clean up, no longer leaving pods around after the tasks finished.