Scalding: avro file with avro read with nested structure

I need to read up on the Avro file in Scalding but dont know how to work with it. I've worked with simple avro files, but it's a little more complicated. The diagram looks like this:

{"type":"record",
 "name":"features",
 "namespace":"OurCode",
 "fields":[{"name":"key","type":"long"},
       {"name":"features",
        "type":{"type":"map","values":"double"}}]
}

      

Not sure how to read this data when the second "field" is a nested field containing multiple fields within it, and when each record contains a potentially different set of nested fields.

I first tried to read it when using UnpackAvroSource and wrote it in Tsv, but I ended up with data that looked like this:

key1   {var1=4, var2 = 3, var4 = 10}
key2   {var3 = 15, var4 = 9, var5 = 22}

      

Also tried to create a case class:

case class FileType(var key:Long, var features:Map[String,Double])

      

and then tried to read it with:

PackedAvroSource[FileType](args("input"))

      

There was an error that says: Could not find implicit proof parameter value of type com.twitter.scalding.avro.AvroSchemaType [FileReader.this.FileType], where FileReader is the name of the class in which the data is read into.

Ultimately, I need to turn the above data into something similar:

             Var1   Var2   Var3   Var4   Var5
Key1           1      3     0      10     0
Key2           0      0     15      9     22

      

So, if there is a better way to do it, this will work too.

Not very proficient with searing or avro files so any help is appreciated here. Let me know what additional information I might need.

Thank.

+3


source to share





All Articles