I simply do in my code :
df.write.parquet(...)
It's a simple HDFS writing of Parquet files from a given Dataframe, at the end of the spark app.
When running the app, I get a task failed while writing rows Exception, caused by a futures timed out, as you can see bellow :
(spark.task.maxFailures is set to 1, so it's normal that a failed task, trigger an ERROR and the whole app to shutdown. I may go the default spark.task.maxFailures to 4, but I might end up hiding the root cause of that write problem)
Questions :
- What is the problem when writing to HDFS ?
- How to ìncrease the
10 secondsfutures timeout ? (I see no config that is set to those 10 seconds in the Spark UI Environnement section, so it's a bit strange)
Note : it might not be related, but on others app, i also might get that strange 10 seconds future timeout, when writing to Cassandra for instance (and so not HDFS), or when communicating from Executor with the Spark driver (but the error is a simple warning here, the app continue), so I suspect global network issues.. ?
