How to pretty print Spark DDL string?

191 Views Asked by At

The schema string I get using .toDDL is succint, but very unreadable for complex schemas. How can it be formatted so that it looks easier on the eyes with all the indentations and line breaks?

1

There are 1 best solutions below

0
s.polam On

I believe there is no direct function to format DDL.

I use below code to format / parse DDL. It will add printDDL function to df.schema or object of StructType

scala> df.printSchema
root
 |-- author: struct (nullable = true)
 |    |-- firstname: string (nullable = true)
 |    |-- lastname: string (nullable = true)
 |-- category: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- editor: struct (nullable = true)
 |    |-- firstname: string (nullable = true)
 |    |-- lastname: string (nullable = true)
 |-- isbn: string (nullable = true)
 |-- title: string (nullable = true)
scala> df.schema.toDDL
res86: String = author STRUCT<firstname: STRING, lastname: STRING>,category ARRAY<STRING>,editor STRUCT<firstname: STRING, lastname: STRING>,isbn STRING,title STRING
scala> :paste
// Entering paste mode (ctrl-D to finish)

implicit class DDL(val schema: org.apache.spark.sql.types.StructType) {
    def printDDL: Unit = {
        val tableName = "_source"
        spark.sql(s"DROP TABLE IF EXISTS ${tableName}")
        spark.sql(s"CREATE TABLE IF NOT EXISTS ${tableName}(${schema.toDDL}) USING orc")
        println(spark.sql(s"SHOW CREATE TABLE ${tableName}")
        .as[String]
        .head
        .split("\n")
        .filterNot(l => l.contains("CREATE") || l.contains("USING")).mkString("\n ", "\n ", "")
        .dropRight(1))
    }
}
scala> df.schema.printDDL

   author STRUCT<firstname: STRING, lastname: STRING>,
   category ARRAY<STRING>,
   editor STRUCT<firstname: STRING, lastname: STRING>,
   isbn STRING,
   title STRING

scala>