I need to read in an Avro file in Scalding but have no idea how to work with it. I have worked with straightforward avro files but this one is a little more complicated. The schema looks like this:
{"type":"record",
"name":"features",
"namespace":"OurCode",
"fields":[{"name":"key","type":"long"},
{"name":"features",
"type":{"type":"map","values":"double"}}]
}
Not sure how to read this data when the second "field" is a nested field that contains multiple fields inside of it and when each record contains a potentially different set of nested fields.
I initially tried to read it in using UnpackAvroSource and wrote to a Tsv, but I ended up with data that looked like:
key1 {var1=4, var2 = 3, var4 = 10}
key2 {var3 = 15, var4 = 9, var5 = 22}
Also tried creating a case class:
case class FileType(var key:Long, var features:Map[String,Double])
and then tried to read it in with:
PackedAvroSource[FileType](args("input"))
I got an error that says: could not find implicit value for evidence parameter of type com.twitter.scalding.avro.AvroSchemaType[FileReader.this.FileType], whereFileReader is the name of the class where the data is being read in.
Ultimately, I need to turn the above data into something that looks like:
Var1 Var2 Var3 Var4 Var5
Key1 1 3 0 10 0
Key2 0 0 15 9 22
So if there is a better way to do that then that would work too.
Not very experienced with scalding or avro files so any help here is appreciated. Let me know what other info I might need to provide.
Thanks.