Scalding: Trouble reading avro file with nested structure

484 Views Asked by At

I need to read in an Avro file in Scalding but have no idea how to work with it. I have worked with straightforward avro files but this one is a little more complicated. The schema looks like this:

{"type":"record",
 "name":"features",
 "namespace":"OurCode",
 "fields":[{"name":"key","type":"long"},
       {"name":"features",
        "type":{"type":"map","values":"double"}}]
}

Not sure how to read this data when the second "field" is a nested field that contains multiple fields inside of it and when each record contains a potentially different set of nested fields.

I initially tried to read it in using UnpackAvroSource and wrote to a Tsv, but I ended up with data that looked like:

key1   {var1=4, var2 = 3, var4 = 10}
key2   {var3 = 15, var4 = 9, var5 = 22}

Also tried creating a case class:

case class FileType(var key:Long, var features:Map[String,Double])

and then tried to read it in with:

PackedAvroSource[FileType](args("input"))

I got an error that says: could not find implicit value for evidence parameter of type com.twitter.scalding.avro.AvroSchemaType[FileReader.this.FileType], whereFileReader is the name of the class where the data is being read in.

Ultimately, I need to turn the above data into something that looks like:

             Var1   Var2   Var3   Var4   Var5
Key1           1      3     0      10     0
Key2           0      0     15      9     22

So if there is a better way to do that then that would work too.

Not very experienced with scalding or avro files so any help here is appreciated. Let me know what other info I might need to provide.

Thanks.

0

There are 0 best solutions below