k8s job - keep a failed pod

928 Views Asked by At

Below is my definition for a k8s job (to convert a column of a mysql table from int->bigint using Percona's pt-online-schema-change):

apiVersion: batch/v1
kind: Job
metadata:
  name: bigint-tablename-columnname
  namespace: prod
spec:
  backoffLimit: 0
  template:
    metadata:
      name: convert-int-to-bigint-
    spec:
      containers:
      - name: percona
    image: perconalab/percona-toolkit:3.2.1
    command: [
      "/bin/bash",
      "-c",
      "pt-online-schema-change --host=dbhost --user=dbuser --password=dbpassword D=dbname,t=tablename --alter \"MODIFY COLUMN columnname BIGINT\" --alter-foreign-keys-method \"rebuild_constraints\" --nocheck-foreign-keys --execute"
    ]
    env:
      - name: SYMFONY__POD_NAMESPACE
        valueFrom:
          fieldRef:
            fieldPath: metadata.namespace
      restartPolicy: Never

I've experienced that the pod failed for some reason - in a kubectl describe job jobname I see Pods Statuses: 0 Active (0 Ready) / 0 Succeeded / 1 Failed. However in kubectl get pods there is no pod associated with the job so I cannot view the pod logs to find out why it failed.

I thought using restartPolicy: Never should keep the pod around as per 1, 2, but clearly my understanding isn't correct. So how do I ensure that if this process fails then the pod is still kept for me to inspect?

1

There are 1 best solutions below

2
aboitier On

If the pod fails or terminates, you won't be able to get the logs. This is because logs are only fetched for existing resources.

One way to do so is to continuously save your logs while your pod is alive. There are different strategies to do so that you can find in the documentation. Using a logging backend is one of them.

https://kubernetes.io/docs/concepts/cluster-administration/logging/