Fixed csv data validation error for date and time data of Hive type
Hive table schema:
c_date date
c_timestamp timestamp
Text table
Layout table data:
hive> select * from all_datetime_types;
OK
0001-01-01 0001-01-01 00:00:00.000000001
9999-12-31 9999-12-31 23:59:59.999999999
csv obtained after spark work:
c_date,c_timestamp
0001-01-01 00:00:00.0,0001-01-01 00:00:00.0
9999-12-31 00:00:00.0,9999-12-31 23:59:59.999
Questions:
-
00:00:00.0
added to the date type - the timestamp is truncated to milliseconds of precision
Helpful code:
SparkConf conf = new SparkConf(true).setMaster("yarn-cluster").setAppName("SAMPLE_APP");
SparkContext sc = new SparkContext(conf);
HiveContext hc = new HiveContext(sc);
DataFrame df = hc.table("testdb.all_datetime_types");
df.printSchema();
DataFrameWriter writer = df.repartition(1).write();
writer.format("com.databricks.spark.csv").option("header", "true").save(outputHdfsFile);
I know the option dateFormat
. But columns date
and timestamp
can have different formats in Hive.
Is it possible to just hide all columns up to a row?
0
source to share