I am running a Spark job writing to an Alluxio cluster with 20 workers (Alluxio 1.6.1). Spark job failed to write its output due to alluxio.exception.status.DeadlineExceededException
. The worker is still alive from Alluxio WebUI. How can I avoid this failure?
alluxio.exception.status.DeadlineExceededException: Timeout writing to WorkerNetAddress{host=spark-74-44.xxxx, rpcPort=51998, dataPort=51999, webPort=51997, domainSocketPath=} for request type: ALLUXIO_BLOCK
id: 3209355843338240
tier: 0
worker_group {
host: "spark6-64-156.xxxx"
rpc_port: 51998
data_port: 51999
web_port: 51997
socket_path: ""
}
This error indicates that your Spark job timed out while trying to write data to an Alluxio worker. The worker could be under high load, or have a slow connection to your UFS.
The default timeout is 30 seconds. To increase the timeout, configure
alluxio.user.network.netty.timeout
on the Spark side.For example, to increase the timeout to 5 minutes, use the
--conf
option tospark-submit
You can also set these properties in your
spark-defaults.conf
file to have them automatically applied to all jobs.Source: https://www.alluxio.org/docs/1.6/en/Configuration-Settings.html#spark-jobs