How to find the time for each step of the process in Spark

524 Views Asked by At

I use the following function written in PySpark to log steps in a SQL table.

def log_df_update(spark,IsComplete,Status,EndDate,ErrorMessage,RowCounts,StartDate,FilePath,tablename):
        import pandas as pd
        l = [(process_name_log.value,process_id_log.value,IsComplete,Status,StartDate,EndDate,ErrorMessage,int(RowCounts),FilePath)]
        schema = (StructType([StructField("SourceName", StringType(), True),StructField("SourceID", IntegerType(), True),StructField("IsComplete", IntegerType(), True),StructField("Status", StringType(), True),StructField("StartDate", TimestampType(), True),StructField("EndDate", TimestampType(), True),StructField("ErrorMessage", StringType(), True),StructField("RowCounts", IntegerType(), True),StructField("FilePath", StringType(), True)]))
        rdd_l = sc.parallelize(l)
        log_df = spark.createDataFrame(rdd_l,schema)
        log_df.withColumn("StartDate",from_utc_timestamp(log_df.StartDate,"PST")).withColumn("EndDate",from_utc_timestamp(log_df.EndDate,"PST")).write.jdbc(url=url, table=tablename,mode="append", properties=properties)

The SQL table where this log is created is loaded using a JDBC connection using a separate function.

I makes logs after every step in the process irrespective of whether it failed or not to track the run time as well as completion. So before every step I store the current time in the variable start_date. What i dont know is if the step that is being processed or computed is taking time or if the logging of the process status is taking time. Since the start date-time is from the moment the snipped starts i cant say which is taking time.

To read data from the same table in the SQL server takes few seconds so would it take the same time to write into the table too?

I tried running explain() after logging but that only tells what happened till the step before the logging itself. How can i track the time taken for each step in PySpark.

0

There are 0 best solutions below