How to implement dbutils.data.Summarize of Databricks?

1k Views Asked by At

What do I need to do if I want to achieve similar functions? Especially in precise=false, how is it roughly calculated?

/**
* Summarize a Spark DataFrame and visualize the statistics to get quick insights.
*
* Example: dbutils.data.summarize(df, precise=false)
*
* @param df The dataframe to summarize. Streaming dataframes are not supported.
* @param precise If false, percentiles, distinct item counts, and frequent item counts will
* be computed approximately to reduce the run time.
* If true, distinct item counts and frequent item counts will be computed
* exactly, and percentiles will be computed with high precision.
*
* @return visualization of the computed summmary statistics.
*/
summarize(df: java.lang.Object, precise: boolean): void

see: https://docs.databricks.com/dev-tools/databricks-utils.html#data-utility-dbutilsdata

1

There are 1 best solutions below

0
On

dbutils.data.summarize(df)

df can be pyspark or pandas data frame