Fixed csv data validation error for date and time data of Hive type

Hive table schema:

c_date                  date                                        
c_timestamp             timestamp   

      

Text table

Layout table data:

hive> select * from all_datetime_types;
OK
0001-01-01  0001-01-01 00:00:00.000000001
9999-12-31  9999-12-31 23:59:59.999999999

      

csv obtained after spark work:

c_date,c_timestamp
0001-01-01 00:00:00.0,0001-01-01 00:00:00.0
9999-12-31 00:00:00.0,9999-12-31 23:59:59.999

      

Questions:

  • 00:00:00.0

    added to the date type
  • the timestamp is truncated to milliseconds of precision

Helpful code:

SparkConf conf = new SparkConf(true).setMaster("yarn-cluster").setAppName("SAMPLE_APP");
SparkContext sc = new SparkContext(conf);
HiveContext hc = new HiveContext(sc);
DataFrame df = hc.table("testdb.all_datetime_types");
df.printSchema();
DataFrameWriter writer = df.repartition(1).write();
writer.format("com.databricks.spark.csv").option("header", "true").save(outputHdfsFile);

      


I know the option dateFormat

. But columns date

and timestamp

can have different formats in Hive.

Is it possible to just hide all columns up to a row?

0


source to share


1 answer


you can use the option timestampFormat

in the spark window to specify your time stamp format.



spark.read.option("timestampFormat", "MM/dd/yyyy h:mm:ss a").csv("path")

      

0


source







All Articles