I have deployed "Spark History Server" in Azure Kubernetes Services(AKS) using Helm chart.
Then I submitted a SparkApplication(SparkOperatorOnk8s) with the below yaml
apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
name: hello-world
spec:
type: Python
pythonVersion: '3'
mode: cluster
image: 'docker.io/bitnami/spark:3.5.0'
imagePullPolicy: Always
mainApplicationFile: 'local:///opt/spark/work-dir/script.py'
sparkVersion: 3.5.0
restartPolicy:
type: OnFailure
onFailureRetries: 3
onFailureRetryInterval: 10
onSubmissionFailureRetries: 5
onSubmissionFailureRetryInterval: 20
volumes:
- name: spark-script-volume
configMap:
name: hello-world-cm
- name: code-volume
persistentVolumeClaim:
claimName: code-volume
sparkConf:
"spark.eventLog.enabled": "true"
"spark.eventLog.dir": "file:/data"
"spark.ui.threadDumpsEnabled": "true"
"spark.history.fs.logDirectory": "file:/data"
"spark.driver.log.persistToDfs.enabled": "true"
"spark.driver.log.dfsDir": "file:/data"
"spark.history.fs.driverlog.cleaner.enabled": "true"
"spark.history.fs.driverlog.cleaner.interval": "1d"
"spark.history.fs.driverlog.cleaner.maxAge": "5d"
"spark.io.compression.codec": "snappy"
driver:
tolerations:
- key: "kubernetes.azure.com/scalesetpriority"
operator: "Equal"
value: "spot"
effect: "NoSchedule"
volumeMounts:
- name: spark-script-volume
mountPath: /opt/spark/work-dir/script.py
subPath: hello_world.py
- mountPath: /data
name: code-volume
cores: 1
coreLimit: "2"
memory: 512m
labels:
version: 3.5.0
serviceAccount: my-release-spark
securityContext:
runAsUser: 0 # Set the user ID to 0 (root)
executor:
tolerations:
- key: "kubernetes.azure.com/scalesetpriority"
operator: "Equal"
value: "spot"
effect: "NoSchedule"
volumeMounts:
- mountPath: /data
name: code-volume
cores: 1
instances: 1
memory: 512m
labels:
version: 3.5.0
securityContext:
runAsUser: 0 # Set the user ID to 0 (root)
The Spark driver pods have completed fine and I can see the driver logs using kubectl logs hello-world-driver
. Also, I can view the driver logs in the attached PVC
root@my-spark-history-server-75659fc5b5-wn2wh:/data# ls -lrta
total 1644
drwx------ 2 root root 16384 Nov 20 01:51 lost+found
-rw-rw---- 1 root root 108320 Nov 20 05:43 local-1700458995725
-rw-rw---- 1 root root 108320 Nov 20 14:50 local-1700491856943
-rw-rw---- 1 root root 108357 Nov 20 14:54 local-1700492094743
-rw-rw---- 1 root root 113695 Nov 20 17:24 local-1700501079419
-rw-rw---- 1 root root 113695 Nov 23 08:51 local-1700729506218
drwxr-xr-x 1 root root 4096 Nov 29 11:12 ..
-rw-rw---- 1 root root 113691 Nov 29 15:46 local-1701272771644
-rw-rw---- 1 root root 113691 Nov 29 16:18 local-1701274722700
-rw-rw---- 1 root root 110842 Nov 29 17:56 local-1701280583084
-rw-rw---- 1 root root 110878 Nov 29 18:08 local-1701281298745
-rw-rw---- 1 root root 113695 Nov 29 18:19 local-1701281994847
-rwxrwx--- 1 root root 100 Nov 29 18:20 .local-1701282050932_driver.log.crc
-rwxrwx--- 1 root root 11550 Nov 29 18:20 local-1701282050932_driver.log
-rw-rw---- 1 root root 110968 Nov 29 18:20 local-1701282050932
-rwxrwx--- 1 root root 100 Nov 29 18:39 .local-1701283172349_driver.log.crc
-rwxrwx--- 1 root root 11550 Nov 29 18:39 local-1701283172349_driver.log
-rw-rw---- 1 root root 111169 Nov 29 18:39 local-1701283172349
-rwxrwx--- 1 root root 100 Nov 29 18:50 .local-1701283854944_driver.log.crc
-rwxrwx--- 1 root root 11550 Nov 29 18:50 local-1701283854944_driver.log
-rw-rw---- 1 root root 111129 Nov 29 18:50 local-1701283854944
-rwxrwx--- 1 root root 100 Nov 29 19:02 .local-1701284518591_driver.log.crc
-rwxrwx--- 1 root root 11548 Nov 29 19:02 local-1701284518591_driver.log
drwxr-xr-x 3 root root 4096 Nov 29 19:02 .
-rw-rw---- 1 root root 111159 Nov 29 19:02 local-1701284518591
However, I couldn't see the driver and executor logs in the Spark History Server UI
Spark History Server k8s pod logs:
23/11/29 19:07:14 INFO FsHistoryProvider: Parsing
file:/data/local-1701283172349_driver.log for listing data... 23/11/29
19:07:14 ERROR FsHistoryProvider: Exception while merging application
listings org.apache.spark.SparkIllegalArgumentException:
[CODEC_NOT_AVAILABLE] The codec log is not available. Consider to set
the config "spark.io.compression.codec" to "snappy".
at org.apache.spark.io.CompressionCodec$.$anonfun$createCodec$2(CompressionCodec.scala:92)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.io.CompressionCodec$.createCodec(CompressionCodec.scala:92)
at org.apache.spark.deploy.history.EventLogFileReader$.$anonfun$openEventLog$2(EventLogFileReaders.scala:142)
at java.base/java.util.concurrent.ConcurrentHashMap.computeIfAbsent(Unknown Source)
at org.apache.spark.deploy.history.EventLogFileReader$.$anonfun$openEventLog$1(EventLogFileReaders.scala:142)
at scala.Option.map(Option.scala:230)
at org.apache.spark.deploy.history.EventLogFileReader$.openEventLog(EventLogFileReaders.scala:141)
at org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$parseAppEventLogs$2(FsHistoryProvider.scala:1138)
at org.apache.spark.util.SparkErrorUtils.tryWithResource(SparkErrorUtils.scala:47)
at org.apache.spark.util.SparkErrorUtils.tryWithResource$(SparkErrorUtils.scala:46)
at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:94)
at org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$parseAppEventLogs$1(FsHistoryProvider.scala:1138)
at org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$parseAppEventLogs$1$adapted(FsHistoryProvider.scala:1136)
at scala.collection.immutable.List.foreach(List.scala:431)
at org.apache.spark.deploy.history.FsHistoryProvider.parseAppEventLogs(FsHistoryProvider.scala:1136)
at org.apache.spark.deploy.history.FsHistoryProvider.doMergeApplicationListingInternal(FsHistoryProvider.scala:796)
at org.apache.spark.deploy.history.FsHistoryProvider.doMergeApplicationListing(FsHistoryProvider.scala:765)
at org.apache.spark.deploy.history.FsHistoryProvider.mergeApplicationListing(FsHistoryProvider.scala:714)
at org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$checkForLogs$18(FsHistoryProvider.scala:581)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)