Redshift add column on import using COPY

In Amazon Redshift, I have a table where I need to load data from multiple CSV files:

create table my_table (
  id integer,
  name varchar(50) NULL
  email varchar(50) NULL,
  processed_file varchar(256) NULL
);

      

The first three columns refer to data from files. The last column processed_filed

indicates from which file the record was imported.

I have files in Amazon S3 and I will not import them using the command COPY

. Something like:

COPY {table_name} FROM 's3://file-key' 
WITH CREDENTIALS 'aws_access_key_id=xxxx;aws_secret_access_key=xxxxx' 
DATEFORMAT 'auto' TIMEFORMAT 'auto' MAXERROR 0 ACCEPTINVCHARS '*' DELIMITER '\t' GZIP;

      

Is there a way to fill the fourth column processed_file

automatically with the COPY command to insert the file name.

I can make an UPDATE statement after COPY, but I am dealing with a huge amount of data, so ideally I would like to avoid this if possible.

+3


source to share


2 answers


In fact, it is possible. I am creating and loading data without an extra column processed_file_name

and then adding a column with a default value. Here is the complete process



create table my_table (
  id integer,
  name varchar(50) NULL
  email varchar(50) NULL,
);

COPY {table_name} FROM 's3://file-key' 
WITH CREDENTIALS 'aws_access_key_id=xxxx;aws_secret_access_key=xxxxx' 
DATEFORMAT 'auto' TIMEFORMAT 'auto' MAXERROR 0 ACCEPTINVCHARS '*' DELIMITER '\t' GZIP;

ALTER TABLE my_table ADD COLUMN processed_file_name varchar(256) NOT NULL DEFAULT '{file-name}';

      

0


source


It's impossible.

You either need to preprocess the files (to include the name columns) or update the data after loading (but then it would be difficult to bulk load from multiple files at the same time, which is the most efficient way to load data into Redshift).



See: Redshift Command DocumentationCOPY

+4


source







All Articles