I'm trying to write a unit test that matches my data output, but struggling to create a sample dataframe of the right format.
The schema needs to look like this:
|-- ids: string (nullable = true)
|-- scores: map (nullable = true)
| |-- key: string
| |-- value: struct (valueContainsNull = false)
| | |-- myscore1: double (nullable = true)
| | |-- myscore2: double (nullable = true)
and the output for one row should look ex like:
+-----+-----------------------------------------+
|ids |scores |
+-----+-----------------------------------------+
|id_1|{key1 -> {0.7, 1.3}, key2 -> {0.5, 1.2}} |
+-----+-----------------------------------------+
My best attempt so far is like this but it is still giving null for the scores column...What am I missing?
val exDf = List[(String, Option[String])](("id_1", Some("{\"key1\":Row(0.7, 1.3), \"key2\":Row(0.5, 1.2)}"))).toDF("ids", "scores")
.withColumn("scores",from_json(col("scores"), MapType(StringType, StructType(Array(StructField("myscore1", DoubleType), StructField("myscore2", DoubleType))))))
I've tried a number of variations on the syntax of my exDf, and a number of variations of the schema defined, but I always get a null output for the scores column. I'm running in scala on spark 3.3.1
It's easier to let Scala infer the dataframe column types. For the
Structtypescores, just create a case class with optionalDoubletype fields to make them nullable.