Spark 2.x - How to generate simple Explain/Execution Plan

1.4k Views Asked by At

I am hoping to generate an explain/execution plan in Spark 2.2 with some actions on a dataframe. The goal here is to ensure that partition pruning is occurring as expected before I kick off the job and consume cluster resources. I tried a Spark documentation search and a SO search here but couldn't find a syntax that worked for my situation.

Here is a simple example that works as expected:

scala> List(1, 2, 3, 4).toDF.explain
== Physical Plan ==
LocalTableScan [value#42]

Here's an example that is not working as expected but hoping to get to work:

scala> List(1, 2, 3, 4).toDF.count.explain
<console>:24: error: value explain is not a member of Long
List(1, 2, 3, 4).toDF.count.explain
                               ^

And here's a more detailed example to further exhibit the end goal of the partition pruning that I am hoping to confirm via explain plan.

val newDf = spark.read.parquet(df).filter(s"start >= ${startDt}").filter(s"start <= ${endDt}")

Thanks in advance for any thoughts/feedback.

1

There are 1 best solutions below

0
On

count method is eagerly evaluated and as you see returns Long, so there is no execution plan available.

You have to use a lazy transformation, either:

import org.apache.spark.sql.functions.count

df.select(count($"*"))

or

df.groupBy().agg(count($"*"))