Remove all but minimum values based on two columns in a SQL Server table
how to write an expression to accomplish the following:
lets say the table has 2 columns (both are nvarchar) with the following data
col1 10000_10000_10001_10002_10002_10002
col2 10____20____10____30____40_____50
I would like to store only the following data:
col1 10000_10001_10002
col2 10____10____30
thus removing duplicates based on the values of the second column (none of the columns are primary keys), keeping only those records with the minimum value in the second column.
how to do it?
This should work for you:
;
WITH NotMin AS
(
SELECT Col1, Col2, MIN(Col2) OVER(Partition BY Col1) AS TheMin
FROM Table1
)
DELETE Table1
--SELECT *
FROM Table1
INNER JOIN NotMin
ON Table1.Col1 = NotMin.Col1 AND Table1.Col2 = NotMin.Col2
AND Table1.Col2 != TheMin
In this case, a CTE is used (for example, a view, but cleaner) and the over operator is used as a shortcut to less code. I also added a highlighted comment so you can see the relevant lines (before deleting). This will work in SQL 2005/2008.
Thanks Eric
source to share
Sorry, I misunderstood the question.
SELECT col1, MIN(col2) as col2
FROM table
GROUP BY col1
Of course returns the rows in question, but assuming you cannot modify the table to add a unique ID, you would need to do something like:
DELETE FROM test
WHERE col1 + '|' + col2 NOT IN
(SELECT col1 + '|' + MIN(col2)
FROM test
GROUP BY col1)
Which should work, assuming the pipe symbol never appears in your set.
source to share
Ideally, you would like to say:
DELETE
FROM tbl
WHERE (col1, col2) NOT IN (SELECT col1, MIN(col2) AS col2 FROM tbl GROUP BY col1)
Unfortunately, this is not allowed in T-SQL, but there is a proprietary double FROM extension (using EXCEPT for clarity):
DELETE
FROM tbl
FROM tbl
EXCEPT
SELECT col1, MIN(col2) AS col2 FROM tbl GROUP BY col1
Generally:
DELETE
FROM tbl
WHERE col1 + '|' + col2 NOT IN (SELECT col1 + '|' + MIN(col2) FROM tbl GROUP BY col1)
Or other workarounds.
source to share