Scalding: avro file with avro read with nested structure
I need to read up on the Avro file in Scalding but dont know how to work with it. I've worked with simple avro files, but it's a little more complicated. The diagram looks like this:
{"type":"record",
"name":"features",
"namespace":"OurCode",
"fields":[{"name":"key","type":"long"},
{"name":"features",
"type":{"type":"map","values":"double"}}]
}
Not sure how to read this data when the second "field" is a nested field containing multiple fields within it, and when each record contains a potentially different set of nested fields.
I first tried to read it when using UnpackAvroSource and wrote it in Tsv, but I ended up with data that looked like this:
key1 {var1=4, var2 = 3, var4 = 10}
key2 {var3 = 15, var4 = 9, var5 = 22}
Also tried to create a case class:
case class FileType(var key:Long, var features:Map[String,Double])
and then tried to read it with:
PackedAvroSource[FileType](args("input"))
There was an error that says: Could not find implicit proof parameter value of type com.twitter.scalding.avro.AvroSchemaType [FileReader.this.FileType], where FileReader is the name of the class in which the data is read into.
Ultimately, I need to turn the above data into something similar:
Var1 Var2 Var3 Var4 Var5
Key1 1 3 0 10 0
Key2 0 0 15 9 22
So, if there is a better way to do it, this will work too.
Not very proficient with searing or avro files so any help is appreciated here. Let me know what additional information I might need.
Thank.
source to share
No one has answered this question yet
Check out similar questions: