Bulk insert from CSV file - duplicate skipping

Question

Bulk insert from CSV file - duplicate skipping

UPDATE: Finished with this method created by Johnny Bubriski and then modified it a bit to skip duplicates. Works like a charm and seems to be pretty fast. Link: http://johnnycode.com/2013/08/19/using-c-sharp-sqlbulkcopy-to-import-csv-data-sql-server/

I searched for an answer to this question but didn't seem to find it. I am doing a massive T-SQL insert to load data into a table in a local database from a csv file. My statement looks like this:

BULK INSERT Orders
FROM 'csvfile.csv'
WITH(FIELDTERMINATOR = ';', ROWTERMINATOR = '0x0a', FORMATFILE = 'formatfile.fmt', ERRORFILE = 'C:\\ProgramData\\Tools_TextileMagazine\\AdditionalFiles\\BulkInsertErrors.txt')
GO

SELECT * 
FROM Orders
GO

I get an exception when I try to insert duplicate lines (for example, wrap the same csv file twice), which causes the entire insert to stop and rollback. Pretty self-explanatory as I am breaking the primary key constraint. Right now I am just showing a message to let users know that duplicates are present in the csv file, but this is of course not the correct solution, in fact it is not a solution. My question is, is there a way to ignore these duplicate lines and just skip over them and only add lines that are not duplicated? Perhaps in an attempt to catch it somehow?

If this is not possible, what would be the "correct" (for lack of a better word) way to import data from a csv file? The exception is giving me a bit of trouble. I read somewhere that you can set up a temporary table, load data into it, and choose between the two tables before inserting. But is there really no easier way to do this with bulk insert?

+3

sql sql-server csv bulkinsert sql-server-2014

St0ffer Dec 15. 14 at 13:15

source to share

1 answer

DoctorMick · Accepted Answer · 2014-12-15T13:22:12+0000

You can set the property MAXERRORS

high enough to insert valid records and ignore duplicates. Unfortunately, this would mean that any other errors in the dataset would not cause the load to fail.

Alternatively, you can set a property BATCHSIZE

that will load data in multiple transactions, so if there are duplicates, it will only rollback the batch.

A safer but less efficient way is to load the CSV file into a separate, blank table, and then combine them into the orders table as you mentioned. Personally, that's how I do it.

None of these solutions are perfect, but I can't think of a way to ignore duplicates in bulk insert syntax.

Bulk insert from CSV file - duplicate skipping

More articles: