Remove double quotes "when uploading data to Amazon Redshift Spectrum
I want to load data into an external amazon redshift table. The data is in CSV format and has quotes. We have something like REMOVEQUOTES that we have in the copy command for redshift the outer tables. There are also various options for loading fixed-length data into an external table.
source to share
To create an external Spectrum table, you must reference the syntax CREATE TABLE
provided by Athena. To load the CSV hidden by double quotes, you must use the following lines asROW FORMAT
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
'separatorChar' = ',',
'quoteChar' = '\"',
'escapeChar' = '\\'
)
For fixed length files, use RegexSerDe. In this case, the relevant part of your operator CREATE TABLE
will look like this (assuming 3 fields of 100 length).
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES ("input.regex" = "(.{100})(.{100})(.{100})")
source to share