ALS job failing occasionally

32 Views Asked by At

I have written a recommendation job using pyspark. I have used ALS algorithm for this from pyspark.ml.recommendation import ALS. This job works fine most of the times but on some days it fails due to unknown reason. I am getting this error with the job.

Traceback (most recent call last):
 File "/opt/prism/src/main.py", line 79, in <module>
   res = job.run()
 File "SparkJob.py", line 44, in run
   self.start()
 File "SparkJob.py", line 70, in start
   raise e
 File "SparkJob.py", line 67, in start
   self.execute(self.input_data, 1)
 File "PosterRecommendation.py", line 467, in execute
   if df.count() != 0:
 File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 585, in count
   return int(self._jdf.count())
 File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1304, in __call__
   return_value = get_return_value(
 File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 128, in deco
   return f(*a, **kw)
 File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 326, in get_return_value
   raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling o1399.count.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 207.1 failed 4 times, most recent failure: Lost task 0.3 in stage 207.1 (TID 29840, 10.100.53.130, executor 1): java.lang.ArrayIndexOutOfBoundsException
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2059)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2008)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2007)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2007)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:973)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:973)
at scala.Option.foreach(Option.scala:407)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:973)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2239)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2188)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2177)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:775)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2114)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2135)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2154)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2

I've tried removing null values from the dataset. still it is failing. I've also tried allocating extra resources to the job.

0

There are 0 best solutions below