Unable to process sample word count as Spark job

65 Views Asked by At

I have the spark-master and spark-worker running on SAP Kyma environment (different flavor Kubernetes) along with the Jupyter Lab with ample of CPU and RAM allocation.

I can access the Spark Master UI and see that workers are registered as well (screen shot below). enter image description here

I am using Python3 to submit the job (snippet below)

import pyspark

conf = pyspark.SparkConf()
conf.setMaster('spark://spark-master:7077')
sc = pyspark.SparkContext(conf=conf)
sc

and can see the spark context as output of the sc. After this, I am preparing the data to submit to the spark-master (snippet below)

enter image description here

words = 'the quick brown fox jumps over the lazy dog the quick brown fox jumps over the lazy dog'
seq = words.split()
data = sc.parallelize(seq)
counts = data.map(lambda word: (word, 1)).reduceByKey(lambda a, b: a + b).collect()
dict(counts)
sc.stop()

but it start to log warning messages on notebook(snippet below) and goes forever till I kill the process from spark-master UI.

22/01/27 19:42:39 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
22/01/27 19:42:54 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

I am new to Kyma (Kubernetes) and Spark. Any help would be much appreciated.

Thanks

1

There are 1 best solutions below

0
On BEST ANSWER

For those who stumble upon the same question.

Check your infrastructure certificate. Turned out that the Kubernetes was issuing wrong internal certificate which was not recognised by the pods.

After we fixed the certificate, all started working.