Data Loading statistics in Apache Spark

179 Views Asked by At

I am using using Spark for ETL purposes. Is there a way to generate loading statistics in Apache Spark (or Spark SQL) e.g. number of records loaded from a text file during the load operation as ETL tools like Datastage usually provide? Because of Spark's lazy execution model, I know that we can get such stats by calling action on RDDs which triggers execution (which means we can gather such stats only "after" loading has been done whereas we want stats as data is being loaded). The logs generated by Spark aren't informational in this context either. Also, calling such actions during the execution of ETL steps will be expensive operations for us and we were wondering if there is a way to have DMVs like functionality in Spark for the said purpose. Is there any workaround for that?

0

There are 0 best solutions below