Redshift add column on import using COPY
In Amazon Redshift, I have a table where I need to load data from multiple CSV files:
create table my_table (
id integer,
name varchar(50) NULL
email varchar(50) NULL,
processed_file varchar(256) NULL
);
The first three columns refer to data from files. The last column processed_filed
indicates from which file the record was imported.
I have files in Amazon S3 and I will not import them using the command COPY
. Something like:
COPY {table_name} FROM 's3://file-key'
WITH CREDENTIALS 'aws_access_key_id=xxxx;aws_secret_access_key=xxxxx'
DATEFORMAT 'auto' TIMEFORMAT 'auto' MAXERROR 0 ACCEPTINVCHARS '*' DELIMITER '\t' GZIP;
Is there a way to fill the fourth column processed_file
automatically with the COPY command to insert the file name.
I can make an UPDATE statement after COPY, but I am dealing with a huge amount of data, so ideally I would like to avoid this if possible.
source to share
In fact, it is possible. I am creating and loading data without an extra column processed_file_name
and then adding a column with a default value. Here is the complete process
create table my_table (
id integer,
name varchar(50) NULL
email varchar(50) NULL,
);
COPY {table_name} FROM 's3://file-key'
WITH CREDENTIALS 'aws_access_key_id=xxxx;aws_secret_access_key=xxxxx'
DATEFORMAT 'auto' TIMEFORMAT 'auto' MAXERROR 0 ACCEPTINVCHARS '*' DELIMITER '\t' GZIP;
ALTER TABLE my_table ADD COLUMN processed_file_name varchar(256) NOT NULL DEFAULT '{file-name}';
source to share