Issue Deploying Git Sync DAG to Airflow on Kubernetes

1.3k Views Asked by At

I have been trying to deploy the Git Sync DAG (v3.4.0) to my instance of Airflow (v2.4.1 with helm chart version 1.7.0) running on a kubernetes cluster (v1.23.7+rke2r2).

I followed the deployment instructions from the Airflow documentation which can be found here.

My override_values.yaml is the following.

dags:
  gitSync:
    enabled: true
    repo: [email protected]/MY_COMPANY_NAME/MY_COMPANY-dags.git
    branch: main
    subPath: ""
    sshKeySecret: airflow-ssh-secret
extraSecrets:
  airflow-ssh-secret:
    data: |
      gitSshKey: 'MY_PRIVATE_KEY_IN_B64'

Once airflow is stable, I use the following helm command to update my airflow deployment.

helm upgrade --install airflow apache-airflow/airflow --namespace airflow -f override-values.yaml

Rancher Failures Screenshot

This succeeds, but the deployment never achieves a new stable state with the git-sync containers. The git-sync-init repeatedly fails to complete. I have previously used this approach to deploy git-sync and it worked for months, however it stopped working suddenly. When I attempt to check the logs for the git-sync-init container, they are empty and there doesn't seem to be a verbosity attribute I can enable.

After reading through github issues on the git-sync repo, I also attempted to prepend the ssh:// scheme to the repo url, but that did not fix the issue.

Is there an alternative way for me deploy a git-sync sidecar container to my airflow deployment so that I can access code from private repos?

EDIT:

It appears like the issue was actually with the rancher GUI. Whenever I would use the GUI, the container logs and shell would not load or show anything. However, I was able to open up a kubectl shell, query for the airflow pods with kubectl get pods -n airflow, and query for the specific init container logs with ubectl logs airflow-scheduler-65fcdbb58d-4pnzf git-sync -n airflow.

This yielded the following error.

"msg"="unexpected error syncing repo, will retry" "error"="Run(git submodule update --init --recursive --depth 2): exit status 128: { stdout: "", stderr: "fatal: No url found for submodule path 'COMPANY_NAME/PACKAGE_PATH/PACKAGE' in .gitmodules\n" }"

This pointed to a misconfigured .gitmodules that was not updated when the structure of our dag repo was changed.

0

There are 0 best solutions below