spark-job on spark kubernetes cluster took long time to complete

113 Views Asked by At

I have setup 3 node spark kubernetes cluster with spark-kubernetes-operator helm-chart. The kubernetes cluster deployed on aws t2.2xlarge instances with 8 vcpus and 32gb memory.

I have build RandomForest price prediction spark-pipeline with Scala and run on this cluster. The training dataset(in CSV file) contains around 100,000 records. Following is the SparkApplication used to run the spark job.

apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
  name: spark-job
  namespace: spark-operator
spec:
  type: Scala
  mode: cluster
  image: "erangaeb/spark-app:1.18"
  imagePullPolicy: Always
  mainClass: com.rahasak.sparkapp.Tea3RandomForest
  mainApplicationFile: "local:///app/spark-app.jar"
  sparkVersion: "3.1.1"
  restartPolicy:
    type: Never
  sparkConf:
    "spark.ui.port": "4041"
  dynamicAllocation:
    enabled: true
  driver:
    cores: 1
    memory: "18g"
    labels:
      version: 3.1.1
    serviceAccount: tea3-spark
    volumeMounts:
      - name: "data-volume"
        mountPath: "/mnt/data"
  executor:
    cores: 2
    memory: "24g"
    instances: 4
    labels:
      version: 3.1.1
    volumeMounts:
      - name: "data-volume"
        mountPath: "/mnt/data"
  volumes:
    - name: "data-volume"
      persistentVolumeClaim:
        claimName: rahasak-pvc
  sparkConf:
    spark.kubernetes.local.dirs.tmpfs: "true"
    spark.local.dir: "/mnt/data"

to completed this job, it took 12 days. Any idea about the average time to complete the spark job? I hope 12 days it too much. Are there any optimization that I could to reduce the job time?

0

There are 0 best solutions below