How to implement dbutils.data.Summarize of Databricks？

1k Views Asked by Robin Lin At 22 June 2025 at 01:23

What do I need to do if I want to achieve similar functions? Especially in precise=false, how is it roughly calculated?

/**
* Summarize a Spark DataFrame and visualize the statistics to get quick insights.
*
* Example: dbutils.data.summarize(df, precise=false)
*
* @param df The dataframe to summarize. Streaming dataframes are not supported.
* @param precise If false, percentiles, distinct item counts, and frequent item counts will
* be computed approximately to reduce the run time.
* If true, distinct item counts and frequent item counts will be computed
* exactly, and percentiles will be computed with high precision.
*
* @return visualization of the computed summmary statistics.
*/
summarize(df: java.lang.Object, precise: boolean): void

see: https://docs.databricks.com/dev-tools/databricks-utils.html#data-utility-dbutilsdata

Original Q&A

There are 1 best solutions below

Yiqun On 30 September 2022 at 22:40

dbutils.data.summarize(df)

df can be pyspark or pandas data frame

How to implement dbutils.data.Summarize of Databricks？

There are 1 best solutions below

Related Questions in DATABRICKS

Related Questions in DBUTILS

Trending Questions

Popular # Hahtags

Popular Questions