Spark job failed to write to Alluxio due to DeadlineExceededException

497 Views Asked by At

I am running a Spark job writing to an Alluxio cluster with 20 workers (Alluxio 1.6.1). Spark job failed to write its output due to alluxio.exception.status.DeadlineExceededException. The worker is still alive from Alluxio WebUI. How can I avoid this failure?

alluxio.exception.status.DeadlineExceededException: Timeout writing to WorkerNetAddress{host=spark-74-44.xxxx, rpcPort=51998, dataPort=51999, webPort=51997, domainSocketPath=} for request type: ALLUXIO_BLOCK
id: 3209355843338240
tier: 0
worker_group {
  host: "spark6-64-156.xxxx"
  rpc_port: 51998
  data_port: 51999
  web_port: 51997
  socket_path: ""
}
1

There are 1 best solutions below

0
On

This error indicates that your Spark job timed out while trying to write data to an Alluxio worker. The worker could be under high load, or have a slow connection to your UFS.

The default timeout is 30 seconds. To increase the timeout, configure alluxio.user.network.netty.timeout on the Spark side.

For example, to increase the timeout to 5 minutes, use the --conf option to spark-submit

$ spark-submit --conf 'spark.executor.extraJavaOptions=-Dalluxio.user.network.netty.timeout=5min' \
               --conf 'spark.driver.extraJavaOptions=-Dalluxio.user.network.netty.timeout=5min' \
               ...

You can also set these properties in your spark-defaults.conf file to have them automatically applied to all jobs.

Source: https://www.alluxio.org/docs/1.6/en/Configuration-Settings.html#spark-jobs