SQL Find Possible Duplicates
I need SQL code that will identify possible duplicates in a table. Let's say my table has 4 columns:
-
Identifier (primary key)
-
Date1
-
Date2
-
GroupID
(Date1, Date2, GroupID) form a unique key.
This table is filled with blocks of data at a time, and it often happens that a new block is loaded containing several records that are already there. It's fine as long as the unique key catches them. Unfortunately, sometimes Date1 is empty (or at least "1900/01/01") with either the first load or the later.
So what I need is to determine where the combination (Date2, GroupID) appears more than once and where for one records Date1 = '1900/01/01'
thank
Charles
source to share
You can identify duplicates (date2, GroupID) with
Select date2,GroupID
from t
group by (date2,GroupID)
having count(*) >1
Use this to identify records in the main table that are duplicated:
Select *
from t
where date1='1900/01/01'
and (date2,groupID) = (Select date2,GroupID
from t
group by (date2,GroupID)
having count(*) >1)
NOTE. Since Date1, Date2, GroupID form a unique key, check if your design is correct so that Date1 is NULL. You might have a genuine case where Date 1 is different for two strings, whereas (date2, GroupID) is the same
source to share
If I understand correctly, you are looking for a group of IDs that have the same GroupID and Date2, one Date1 event other than 1900/01/01, and all the other Date1 events are 1900/01/01.
If I understood correctly, here is the request for you:
SELECT T.ID
FROM Table T1
WHERE
(T1.GroupID, T1.Date2) IN
(SELECT T2.GroupID, T2.Date2
WHERE T2.Date1 = '1900/01/01' OR
T2.Date IS NULL
GROUP BY T2.GroupID, T2.Date2)
AND
1 >=
(
SELECT COUNT(*)
FROM TABLE T3
WHERE NOT (T3.Date1 = '1900/01/01')
AND NOT (T3.Date1 IS NULL)
AND T3.GroupID = T1.GroupID
AND T3.Date2 = T1.Date2
)
Hope it helps.
source to share
Besides having a PRIMARY KEY field defined on the table, you can also add other UNIQUE constraints to accomplish the same thing you are asking for. They will confirm that a particular column or set of columns has a unique value in the table.
Note the entry in the MySQL manual for an example:
source to share