Fixed csv data validation error for date and time data of Hive type

Question

Fixed csv data validation error for date and time data of Hive type

Hive table schema:

c_date                  date                                        
c_timestamp             timestamp

Text table

Layout table data:

hive> select * from all_datetime_types;
OK
0001-01-01  0001-01-01 00:00:00.000000001
9999-12-31  9999-12-31 23:59:59.999999999

csv obtained after spark work:

c_date,c_timestamp
0001-01-01 00:00:00.0,0001-01-01 00:00:00.0
9999-12-31 00:00:00.0,9999-12-31 23:59:59.999

Questions:

00:00:00.0

added to the date type
the timestamp is truncated to milliseconds of precision

Helpful code:

SparkConf conf = new SparkConf(true).setMaster("yarn-cluster").setAppName("SAMPLE_APP");
SparkContext sc = new SparkContext(conf);
HiveContext hc = new HiveContext(sc);
DataFrame df = hc.table("testdb.all_datetime_types");
df.printSchema();
DataFrameWriter writer = df.repartition(1).write();
writer.format("com.databricks.spark.csv").option("header", "true").save(outputHdfsFile);

I know the option dateFormat

. But columns date

and timestamp

can have different formats in Hive.

Is it possible to just hide all columns up to a row?

0

csv hive apache-spark apache-spark-sql databricks

dev ツ 23 Mar 17 at 14:38

source to share

1 answer

raam86 · Answer 1 · 2017-03-24T08:49:02+0000

you can use the option timestampFormat

in the spark window to specify your time stamp format.

spark.read.option("timestampFormat", "MM/dd/yyyy h:mm:ss a").csv("path")

Fixed csv data validation error for date and time data of Hive type

More articles: