Finding spark pipeline start time from spline lineage

108 Views Asked by At

Im exploring spline to determine how much time it took for spark to execute a pipeline (from initialising spark context till writing the result). I could see

"timestamp":1611397050192

in the Spline lineage file which is actually write time. Is there any option to get Start Time of the pipeline from Spline Lineage Log?

1

There are 1 best solutions below

0
On

Spline doesn't capture the start time directly, but since Spline 0.6 the execution time is captured in the ExecutionEvent.durationNs property (the value is in nanoseconds). So you can calculate the start time easily as timestamp - durationNs * 1000000

{
  planId: "1214f38d-c2c9-4155-963b-f92d91dac4fa",
  timestamp: 1614094012617,
  durationNs: 69208608168,
)

There is a heads up however. The current version of Spline captures write actions, skipping intermediate or memory only ones like show(), collect() etc. It means, for example, that if you call cache() somewhere of the data frame, the execution time of the subsequent write will be calculated from the reading cached data, as the part of the DAG preceding the cache() will not be triggered by the write.