Spark-operator: infinite spark-pi-driver running with no executors

84 Views Asked by At

I struggled for a long time to install Spark Operator. I followed this guide https://medium.com/@SaphE/deploying-apache-spark-on-kubernetes-using-helm-charts-simplified-cluster-management-and-ee5e4f2264fd. When executing the .yaml file, I couldn't proceed beyond "spark-pi-driver running." I adjusted permissions in the "spark-rbac.yaml" file:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: spark-role
rules:
- apiGroups: ["", "sparkoperator.k8s.io"]
  resources: ["events", "pods", "services", "configmaps", "sparkapplications/status", "scheduledsparkapplications", "sparkapplications"]
  verbs: ["*"]

---

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: spark-role-binding
subjects:
- kind: ServiceAccount
  name: spark
  namespace: spark-jobs
roleRef:
  kind: ClusterRole
  name: spark-role
  apiGroup: rbac.authorization.k8s.io

The "spark-pi.yaml" file looks like this:

apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
  name: spark-pi
  namespace: spark-operator
spec:
  type: Scala
  mode: cluster
  image: "ghcr.io/googlecloudplatform/spark-operator:v1beta2-1.3.4-3.1.1"
  imagePullPolicy: Always
  mainClass: org.apache.spark.examples.SparkPi
  mainApplicationFile: "local:///opt/spark/examples/jars/spark-examples_2.12-3.1.1.jar"
  sparkVersion: "3.1.1"
  restartPolicy:
    type: Never
  volumes:
    - name: "test-volume"
      hostPath:
        path: "/tmp"
        type: Directory
  driver:
    cores: 1
    coreLimit: "1200m"
    memory: "512m"
    labels:
      version: 3.1.1
    serviceAccount: spark
    volumeMounts:
      - name: "test-volume"
        mountPath: "/tmp"
  executor:
    cores: 1
    instances: 1
    memory: "512m"
    labels:
      version: 3.1.1
    serviceAccount: spark
    volumeMounts:
      - name: "test-volume"
        mountPath: "/tmp"

The image "gcr.io/spark-operator/spark:v3.1.1" no longer works due to recent changes from Google. It is crucial for the application to complete successfully. Here are the logs from the "spark-pi-driver" pod:

++ id -u
+ myuid=0
++ id -g
+ mygid=0
+ set +e
++ getent passwd 0
+ uidentry=root:x:0:0:root:/root:/bin/bash
+ set -e
+ echo 0
0
0
root:x:0:0:root:/root:/bin/bash
+ echo 0
+ echo root:x:0:0:root:/root:/bin/bash
+ [[ -z root:x:0:0:root:/root:/bin/bash ]]
+ exec /usr/bin/tini -s -- /usr/bin/spark-operator driver --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.examples.SparkPi local:///opt/spark/examples/jars/spark-examples_2.12-3.1.1.jar

I need assistance in resolving the issues to ensure the proper execution of the application.

I tried to find another working images as an alternative of image "gcr.io/spark-operator/spark:v3.1.1". One i use in my question and another is: "wanghualei/spark-operator_spark-operator"

1

There are 1 best solutions below

1
On

Basically you should use apache/spark image for examples to work. See this spark-on-k8s-operator issue: https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/issues/1888