Hive Equivalent to Spark Vector to Create Table
I have a Spark DataFrame
with one of the type columns Vector
. When I create a beehive table on top of it, I don't know what type it is equivalent to
CREATE EXTERNAL TABLE mix (
topicdist ARRAY<DOUBLE>
)
STORED AS PARQUET
LOCATION 's3://path/to/file.parquet'
The table creation seems to work and returns OK, but when I try
select topicdist from mix limit 1
The error I am getting:
Failed with exception java.io.IOException:java.lang.RuntimeException: Unknown hive type info array<double> when searching for field type
+3
source to share
1 answer
Vector
is a user-defined type Spark and is internally stored as
StructType(Seq( StructField("type", ShortType, true), StructField("size",IntegerType, true), StructField("indices", ArrayType(IntegerType, true), true), StructField("values",ArrayType(DoubleType, true), true) ))
so you will need:
CREATE EXTERNAL TABLE mix (
topicdist struct<type:tinyint,size:int,indices:array<int>,values:array<double>>
)
STORED AS PARQUET
LOCATION 's3://path/to/file.parquet'
Please note that the summary column will not be interpreted as Spark Vector
.
+2
source to share