I'm running a spark-operator on k8s and I need to synchronize my AWS CodeCommit repository directly so I can import my python modules and not have to build the images with them encapsulated in it. I've already used sync with GitHub and deploying SSH to the namespace. However, I am trying to sync with AWS credentials according to the yaml below:
apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
name: teste-sync-{{ macros.datetime.now().strftime("%Y-%m-%d-%H-%M-%S") }}
namespace: processing
spec:
volumes:
- name: ivy
emptyDir: {}
sparkConf:
extraJavaOptions: -Dcom.amazonaws.services.s3.enableV4=true
spark.jars.packages: "org.apache.hadoop:hadoop-aws:3.2.0,org.apache.spark:spark-avro_2.12:3.0.1"
spark.driver.extraJavaOptions: "-Divy.cache.dir=/tmp -Divy.home=/tmp"
spark.kubernetes.allocation.batch.size: "10"
spark.sql.debug.maxToStringFields: "2000"
hadoopConf:
"fs.s3a.impl": "org.apache.hadoop.fs.s3a.S3AFileSystem"
"fs.s3a.path.style.access": "True"
"fs.s3a.connection.ssl.enabled": "True"
type: Python
pythonVersion: "3"
mode: cluster
image: url_spark_image
imagePullPolicy: Always
mainApplicationFile: teste-sync.py
sparkVersion: "3.1.2"
restartPolicy:
type: Never
volumes:
- name: ivy
emptyDir: {}
- name: scripts
emptyDir: {}
driver:
volumeMounts:
- name: scripts
mountPath: /git-sync
initContainers:
- name: git-sync
image: "k8s.gcr.io/git-sync/git-sync:v3.6.1"
imagePullPolicy: IfNotPresent
volumeMounts:
- name: scripts
mountPath: /scripts
env:
- name: GIT_SYNC_REPO
value: "https://git-codecommit.MY_REGION.amazonaws.com/v1/repos/MY_REPO"
- name: GIT_SYNC_BRANCH
value: "master"
- name: GIT_SYNC_ROOT
value: /dags
- name: GIT_SYNC_DEST
value: "main"
- name: GIT_SYNC_ONE_TIME
value: "true"
- name: GIT_SYNC_SSH
value: "false"
- name: GIT_SYNC_AUTH
value: "basic"
- name: AWS_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
name: aws-credentials
key: aws_access_key_id
- name: AWS_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
name: aws-credentials
key: aws_secret_access_key
env:
- name: PYTHONPATH
value: "$PYTHONPATH:/git-sync/main/scripts"
envSecretKeyRefs:
AWS_ACCESS_KEY_ID:
name: aws-credentials
key: aws_access_key_id
AWS_SECRET_ACCESS_KEY:
name: aws-credentials
key: aws_secret_access_key
cores: 1
coreLimit: "1200m"
memory: "2g"
labels:
version: 3.1.2
serviceAccount: spark
volumeMounts:
- name: ivy
mountPath: /tmp
executor:
envSecretKeyRefs:
AWS_ACCESS_KEY_ID:
name: aws-credentials
key: aws_access_key_id
AWS_SECRET_ACCESS_KEY:
name: aws-credentials
key: aws_secret_access_key
cores: 1
instances: 2
memory: "3g"
labels:
version: 3.1.2
volumeMounts:
- name: ivy
mountPath: /tmp
From the tests I did it's not working. Can anyone help me? Is there a problem with yaml or will this type of authentication not work and will I have to deploy SSH?
From the tests I did it's not working. Can anyone help me? Is there a problem with yaml or will this type of authentication not work and will I have to deploy SSH?