Getting null output when specifying a schema to read data in a BigQuery select operation
I am facing an issue when selecting data from a BigQuery table with the specified schema.
val tableData:RDD[String] = sqlContext.sparkContext.newAPIHadoopRDD(
hadoopConf,
classOf[GsonBigQueryInputFormat],
classOf[LongWritable],
classOf[JsonObject]).map(_._2.toString)
val jsonSchema:StructType = (new StructType).add("f1",IntegerType,true).add("f2",FloatType,true).add("f3",StringType,true).add("f4",BooleanType,true).add("f5",DateType,true).add("f6",TimestampType,true)
val df = sqlContext.read.schema(jsonSchema).json(tableData)
When I specify the schema as shown above, I get a null result in the dataframe. But when no schema has indicated that it has the correct results.
df.printSchema()
root
|-- f1: integer (nullable = true)
|-- f2: float (nullable = true)
|-- f3: string (nullable = true)
|-- f4: boolean (nullable = true)
|-- f5: date (nullable = true)
|-- f6: timestamp (nullable = true)
df.show
+----+----+----+----+----+----+
| f1| f2| f3| f4| f5| f6|
+----+----+----+----+----+----+
|null|null|null|null|null|null|
Upon analysis, I found that BigQuery exports table data in the following ex format:
{"f1":"3","f2":2.7,"f3":"Anna","f4":true,"f5":"2014-10-15","f6":"2014-10-15 03:15:58 UTC"}
...
When I read from tableData using json format, it cannot pass data with the specified schema and returns null.
How can I get the correct output with the above schema? Please suggest if you have an idea / solution.
+3
source to share
No one has answered this question yet
Check out similar questions: