Im exploring spline to determine how much time it took for spark to execute a pipeline (from initialising spark context till writing the result). I could see
"timestamp":1611397050192
in the Spline lineage file which is actually write time. Is there any option to get Start Time of the pipeline from Spline Lineage Log?
Spline doesn't capture the start time directly, but since Spline 0.6 the execution time is captured in the
ExecutionEvent.durationNs
property (the value is in nanoseconds). So you can calculate the start time easily astimestamp - durationNs * 1000000
There is a heads up however. The current version of Spline captures write actions, skipping intermediate or memory only ones like
show()
,collect()
etc. It means, for example, that if you callcache()
somewhere of the data frame, the execution time of the subsequent write will be calculated from the reading cached data, as the part of the DAG preceding thecache()
will not be triggered by the write.