Let's say I have these JSON lines stored is a text file.
{"a": "...", "data": [{}]}
{"a": "...", "data": [{"b": "..."}]}
{"a": "...", "data": [{"d": "..."}]}
{"a": "...", "data": [{"b": "...", "c": "..."}]}
I would like to process the file to a Spark Dataset
, but I don't know the precise schema of the field data
. I used upickle
to convert the JSON to a case class
case class MyCC(a: String, data: Seq[ujson.Value.Obj])
implicit val r: Reader[MyCC] = macroR
sc.textFile("s3://path/to/file.txt")
.map(uread[MyCC](_))
.toDS // Dataset[MyCC]
.show()
Trying this, I get the following error:
java.lang.UnsupportedOperationException: No Encoder found for ujson.Value
- map value class: "ujson.Value"
- field (class: "scala.collection.mutable.LinkedHashMap", name:
"value")
- array element class: "ujson.Obj"
- field (class: "scala.collection.Seq", name: "data")
- root class: "com.mycaule.MyCC"
How do I solve this data modelization problem ?
Thank you
I could read the data in without creating the custom encoders as required. I just defined the case class properly.
Following is the output :