Getting null output when specifying a schema to read data in a BigQuery select operation

I am facing an issue when selecting data from a BigQuery table with the specified schema.

val tableData:RDD[String] = sqlContext.sparkContext.newAPIHadoopRDD(
hadoopConf,
classOf[GsonBigQueryInputFormat],
classOf[LongWritable],
classOf[JsonObject]).map(_._2.toString)

val jsonSchema:StructType = (new StructType).add("f1",IntegerType,true).add("f2",FloatType,true).add("f3",StringType,true).add("f4",BooleanType,true).add("f5",DateType,true).add("f6",TimestampType,true)

val df = sqlContext.read.schema(jsonSchema).json(tableData)

      

When I specify the schema as shown above, I get a null result in the dataframe. But when no schema has indicated that it has the correct results.

df.printSchema()

root
|-- f1: integer (nullable = true)
|-- f2: float (nullable = true)
|-- f3: string (nullable = true)
|-- f4: boolean (nullable = true)
|-- f5: date (nullable = true)
|-- f6: timestamp (nullable = true)

 df.show
+----+----+----+----+----+----+
| f1| f2| f3| f4| f5| f6|
+----+----+----+----+----+----+
|null|null|null|null|null|null|

      

Upon analysis, I found that BigQuery exports table data in the following ex format:

{"f1":"3","f2":2.7,"f3":"Anna","f4":true,"f5":"2014-10-15","f6":"2014-10-15 03:15:58 UTC"}

...

When I read from tableData using json format, it cannot pass data with the specified schema and returns null.

How can I get the correct output with the above schema? Please suggest if you have an idea / solution.

+3


source to share





All Articles