Remove double quotes "when uploading data to Amazon Redshift Spectrum

I want to load data into an external amazon redshift table. The data is in CSV format and has quotes. We have something like REMOVEQUOTES that we have in the copy command for redshift the outer tables. There are also various options for loading fixed-length data into an external table.

+3


source to share


1 answer


To create an external Spectrum table, you must reference the syntax CREATE TABLE

provided by Athena. To load the CSV hidden by double quotes, you must use the following lines asROW FORMAT

ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
    'separatorChar' = ',',
    'quoteChar' = '\"',
    'escapeChar' = '\\'
)

      



For fixed length files, use RegexSerDe. In this case, the relevant part of your operator CREATE TABLE

will look like this (assuming 3 fields of 100 length).

ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES ("input.regex" = "(.{100})(.{100})(.{100})")

      

+2


source







All Articles