Hive Equivalent to Spark Vector to Create Table

I have a Spark DataFrame

with one of the type columns Vector

. When I create a beehive table on top of it, I don't know what type it is equivalent to

CREATE EXTERNAL TABLE mix (
        topicdist ARRAY<DOUBLE>
    )
STORED AS PARQUET
LOCATION 's3://path/to/file.parquet'

      

The table creation seems to work and returns OK, but when I try

select topicdist from mix limit 1

      

The error I am getting:

Failed with exception java.io.IOException:java.lang.RuntimeException: Unknown hive type info array<double> when searching for field type

      

+3


source to share


1 answer


Vector

is a user-defined type Spark and is internally stored as

StructType(Seq(
  StructField("type", ShortType, true), 
  StructField("size",IntegerType, true),
  StructField("indices", ArrayType(IntegerType, true), true),
  StructField("values",ArrayType(DoubleType, true), true)
))

      

so you will need:



CREATE EXTERNAL TABLE mix (
  topicdist struct<type:tinyint,size:int,indices:array<int>,values:array<double>>
)
STORED AS PARQUET
LOCATION 's3://path/to/file.parquet'

      

Please note that the summary column will not be interpreted as Spark Vector

.

+2


source







All Articles