I run a Spark Job and try to tune it faster. It is weird that the total uptime is 1.1 hours, but I add up all the job duration. It only takes 25 mins. I'm curious about Why the total uptime in Spark UI is not equal to the sum of all job duration?
This is the Spark UI information. Total uptime is 1.1 hour.
But the sum of all the jobs duration is around 25 mins All job's duration
thanks you very much
Total uptime
is time since Spark application or driver started.Jobs durations
is the time spent in processing the tasks onRDDs/DataFrames
.All the statements which are executed by the driver program contribute to the total uptime but not necessarily to the job duration. For eg:
Another example is how the spark-redshift connector works. Every query(DAG) execution when reading or writing from redshift issues a
COPY
/UNLOAD
command to write the data to/from s3.During this operation executors are not doing any work and the driver program is blocked until the data transfer to s3 is completed. This time will add in the total uptime but won't show in
Job duration
. Further actions on theDataFrame
(which now internally reads files from s3) will add to theJob duration