What do I need to do if I want to achieve similar functions?
Especially in precise=false
, how is it roughly calculated?
/**
* Summarize a Spark DataFrame and visualize the statistics to get quick insights.
*
* Example: dbutils.data.summarize(df, precise=false)
*
* @param df The dataframe to summarize. Streaming dataframes are not supported.
* @param precise If false, percentiles, distinct item counts, and frequent item counts will
* be computed approximately to reduce the run time.
* If true, distinct item counts and frequent item counts will be computed
* exactly, and percentiles will be computed with high precision.
*
* @return visualization of the computed summmary statistics.
*/
summarize(df: java.lang.Object, precise: boolean): void
see: https://docs.databricks.com/dev-tools/databricks-utils.html#data-utility-dbutilsdata
dbutils.data.summarize(df)
df can be pyspark or pandas data frame