Spark UI - History Server - Unable to view driver logs

178 Views Asked by At

I have deployed "Spark History Server" in Azure Kubernetes Services(AKS) using Helm chart.

Then I submitted a SparkApplication(SparkOperatorOnk8s) with the below yaml

apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
  name: hello-world
spec:
  type: Python
  pythonVersion: '3'
  mode: cluster
  image: 'docker.io/bitnami/spark:3.5.0'
  imagePullPolicy: Always
  mainApplicationFile: 'local:///opt/spark/work-dir/script.py'
  sparkVersion: 3.5.0
  restartPolicy:
    type: OnFailure
    onFailureRetries: 3
    onFailureRetryInterval: 10
    onSubmissionFailureRetries: 5
    onSubmissionFailureRetryInterval: 20
  volumes:
    - name: spark-script-volume
      configMap:
        name: hello-world-cm
    - name: code-volume
      persistentVolumeClaim:
        claimName: code-volume
  sparkConf:
    "spark.eventLog.enabled": "true"
    "spark.eventLog.dir": "file:/data"     
    "spark.ui.threadDumpsEnabled": "true"
    "spark.history.fs.logDirectory": "file:/data"
    "spark.driver.log.persistToDfs.enabled": "true"
    "spark.driver.log.dfsDir": "file:/data" 
    "spark.history.fs.driverlog.cleaner.enabled": "true"
    "spark.history.fs.driverlog.cleaner.interval": "1d"
    "spark.history.fs.driverlog.cleaner.maxAge": "5d"
    "spark.io.compression.codec": "snappy"
  driver:
    tolerations:
    - key: "kubernetes.azure.com/scalesetpriority"
      operator: "Equal"
      value: "spot"
      effect: "NoSchedule"
    volumeMounts:
      - name: spark-script-volume
        mountPath: /opt/spark/work-dir/script.py
        subPath: hello_world.py
      - mountPath: /data
        name: code-volume              
    cores: 1
    coreLimit: "2"
    memory: 512m
    labels:
      version: 3.5.0
    serviceAccount: my-release-spark
    securityContext:
      runAsUser: 0  # Set the user ID to 0 (root)    
  executor:
    tolerations:
    - key: "kubernetes.azure.com/scalesetpriority"
      operator: "Equal"
      value: "spot"
      effect: "NoSchedule"
    volumeMounts:
      - mountPath: /data
        name: code-volume          
    cores: 1
    instances: 1
    memory: 512m
    labels:
      version: 3.5.0
    securityContext:
      runAsUser: 0  # Set the user ID to 0 (root) 

The Spark driver pods have completed fine and I can see the driver logs using kubectl logs hello-world-driver . Also, I can view the driver logs in the attached PVC

root@my-spark-history-server-75659fc5b5-wn2wh:/data# ls -lrta
total 1644
drwx------ 2 root root  16384 Nov 20 01:51 lost+found
-rw-rw---- 1 root root 108320 Nov 20 05:43 local-1700458995725
-rw-rw---- 1 root root 108320 Nov 20 14:50 local-1700491856943
-rw-rw---- 1 root root 108357 Nov 20 14:54 local-1700492094743
-rw-rw---- 1 root root 113695 Nov 20 17:24 local-1700501079419
-rw-rw---- 1 root root 113695 Nov 23 08:51 local-1700729506218
drwxr-xr-x 1 root root   4096 Nov 29 11:12 ..
-rw-rw---- 1 root root 113691 Nov 29 15:46 local-1701272771644
-rw-rw---- 1 root root 113691 Nov 29 16:18 local-1701274722700
-rw-rw---- 1 root root 110842 Nov 29 17:56 local-1701280583084
-rw-rw---- 1 root root 110878 Nov 29 18:08 local-1701281298745
-rw-rw---- 1 root root 113695 Nov 29 18:19 local-1701281994847
-rwxrwx--- 1 root root    100 Nov 29 18:20 .local-1701282050932_driver.log.crc
-rwxrwx--- 1 root root  11550 Nov 29 18:20 local-1701282050932_driver.log
-rw-rw---- 1 root root 110968 Nov 29 18:20 local-1701282050932
-rwxrwx--- 1 root root    100 Nov 29 18:39 .local-1701283172349_driver.log.crc
-rwxrwx--- 1 root root  11550 Nov 29 18:39 local-1701283172349_driver.log
-rw-rw---- 1 root root 111169 Nov 29 18:39 local-1701283172349
-rwxrwx--- 1 root root    100 Nov 29 18:50 .local-1701283854944_driver.log.crc
-rwxrwx--- 1 root root  11550 Nov 29 18:50 local-1701283854944_driver.log
-rw-rw---- 1 root root 111129 Nov 29 18:50 local-1701283854944
-rwxrwx--- 1 root root    100 Nov 29 19:02 .local-1701284518591_driver.log.crc
-rwxrwx--- 1 root root  11548 Nov 29 19:02 local-1701284518591_driver.log
drwxr-xr-x 3 root root   4096 Nov 29 19:02 .
-rw-rw---- 1 root root 111159 Nov 29 19:02 local-1701284518591

However, I couldn't see the driver and executor logs in the Spark History Server UI

Spark History Server: enter image description here enter image description here

Spark History Server k8s pod logs:

23/11/29 19:07:14 INFO FsHistoryProvider: Parsing
file:/data/local-1701283172349_driver.log for listing data... 23/11/29
19:07:14 ERROR FsHistoryProvider: Exception while merging application
listings org.apache.spark.SparkIllegalArgumentException:
[CODEC_NOT_AVAILABLE] The codec log is not available. Consider to set
the config "spark.io.compression.codec" to "snappy".
        at org.apache.spark.io.CompressionCodec$.$anonfun$createCodec$2(CompressionCodec.scala:92)
        at scala.Option.getOrElse(Option.scala:189)
        at org.apache.spark.io.CompressionCodec$.createCodec(CompressionCodec.scala:92)
        at org.apache.spark.deploy.history.EventLogFileReader$.$anonfun$openEventLog$2(EventLogFileReaders.scala:142)
        at java.base/java.util.concurrent.ConcurrentHashMap.computeIfAbsent(Unknown Source)
        at org.apache.spark.deploy.history.EventLogFileReader$.$anonfun$openEventLog$1(EventLogFileReaders.scala:142)
        at scala.Option.map(Option.scala:230)
        at org.apache.spark.deploy.history.EventLogFileReader$.openEventLog(EventLogFileReaders.scala:141)
        at org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$parseAppEventLogs$2(FsHistoryProvider.scala:1138)
        at org.apache.spark.util.SparkErrorUtils.tryWithResource(SparkErrorUtils.scala:47)
        at org.apache.spark.util.SparkErrorUtils.tryWithResource$(SparkErrorUtils.scala:46)
        at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:94)
        at org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$parseAppEventLogs$1(FsHistoryProvider.scala:1138)
        at org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$parseAppEventLogs$1$adapted(FsHistoryProvider.scala:1136)
        at scala.collection.immutable.List.foreach(List.scala:431)
        at org.apache.spark.deploy.history.FsHistoryProvider.parseAppEventLogs(FsHistoryProvider.scala:1136)
        at org.apache.spark.deploy.history.FsHistoryProvider.doMergeApplicationListingInternal(FsHistoryProvider.scala:796)
        at org.apache.spark.deploy.history.FsHistoryProvider.doMergeApplicationListing(FsHistoryProvider.scala:765)
        at org.apache.spark.deploy.history.FsHistoryProvider.mergeApplicationListing(FsHistoryProvider.scala:714)
        at org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$checkForLogs$18(FsHistoryProvider.scala:581)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
        at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.base/java.lang.Thread.run(Unknown Source)
0

There are 0 best solutions below