SQL import skips duplicates

I am trying to bulk load into a SQL server database. The original file has duplicates that I want to delete, so I was hoping that the operation would automatically download the first one and then discard the rest. (I have set a unique key constraint). The problem is that the moment the duplicate attempt loads, it all fails and is rolled back. Is there a way I can just tell SQL to continue?


source to share

3 answers

Try to insert data into a temporary table and then SELECT DISTINCT as suggested by @madcolor, or

SELECT * FROM #tempTable tt
WHERE NOT EXISTS (SELECT 1 FROM youTable yt WHERE yt.id = tt.id)


or another field in the WHERE clause.



If you are doing this with some SQL tool like SQL Plus or DBVis or Toad, I suspect not. If you do it programmatically in a language, then you need to divide and conquer. Presumably doing line by line and catching each exception would be too long a process, so instead you could do a batch operation first in the whole SQL block, and if that fails, do it in the first half, and if that fails, do this is in the first half of the first half. Iterate this way until you have a block that succeeds. Drop the block and follow the same procedure on the rest of the SQL. Anything that violates the constraint will end up as the only SQL statement that you know is logged and discarded.This should be imported with as much processing as possible while highlighting invalid lines.



Use SSIS for this. You can say this to skip duplicates. But first, make sure they are true duplicates. What if the data in some columns is different, how do you know which one is best to store?



All Articles