Convert string column to JSON map of structs in scala

59 Views Asked by At

I'm trying to write a unit test that matches my data output, but struggling to create a sample dataframe of the right format.

The schema needs to look like this:

    |-- ids: string (nullable = true)
    |-- scores: map (nullable = true)
    |    |-- key: string
    |    |-- value: struct (valueContainsNull = false)
    |    |    |-- myscore1: double (nullable = true)
    |    |    |-- myscore2: double (nullable = true)

and the output for one row should look ex like:

    +-----+-----------------------------------------+
    |ids  |scores                                   |
    +-----+-----------------------------------------+
    |id_1|{key1 -> {0.7, 1.3}, key2 -> {0.5, 1.2}}  |
    +-----+-----------------------------------------+

My best attempt so far is like this but it is still giving null for the scores column...What am I missing?

val exDf = List[(String, Option[String])](("id_1", Some("{\"key1\":Row(0.7, 1.3), \"key2\":Row(0.5, 1.2)}"))).toDF("ids", "scores")
.withColumn("scores",from_json(col("scores"), MapType(StringType, StructType(Array(StructField("myscore1", DoubleType), StructField("myscore2", DoubleType))))))

I've tried a number of variations on the syntax of my exDf, and a number of variations of the schema defined, but I always get a null output for the scores column. I'm running in scala on spark 3.3.1

1

There are 1 best solutions below

1
Test Mirror On BEST ANSWER

It's easier to let Scala infer the dataframe column types. For the Struct type scores, just create a case class with optional Double type fields to make them nullable.

case class ScoreVal(myscore1: Option[Double], myscore2: Option[Double])

val exDf = Seq(
  ("id_1", Map("key1" -> ScoreVal(Some(0.7), Some(1.3)), "key2" -> ScoreVal(Some(0.5), Some(1.2)))),
  ("id_2", Map("key3" -> ScoreVal(Some(2.0), None)))
).toDF("ids", "scores")

exDf.printSchema
root
 |-- ids: string (nullable = true)
 |-- scores: map (nullable = true)
 |    |-- key: string
 |    |-- value: struct (valueContainsNull = true)
 |    |    |-- myscore1: double (nullable = true)
 |    |    |-- myscore2: double (nullable = true)