SQL Find Possible Duplicates

Question

SQL Find Possible Duplicates

I need SQL code that will identify possible duplicates in a table. Let's say my table has 4 columns:

Identifier (primary key)
Date1
Date2
GroupID

(Date1, Date2, GroupID) form a unique key.

This table is filled with blocks of data at a time, and it often happens that a new block is loaded containing several records that are already there. It's fine as long as the unique key catches them. Unfortunately, sometimes Date1 is empty (or at least "1900/01/01") with either the first load or the later.

So what I need is to determine where the combination (Date2, GroupID) appears more than once and where for one records Date1 = '1900/01/01'

thank

Charles

+2

sql duplicates

Karl 25 Aug '09 at 5:21

source to share

7 replies

You can identify duplicates (date2, GroupID) with

Select date2,GroupID
from t
group by (date2,GroupID)
having count(*) >1

Use this to identify records in the main table that are duplicated:

Select *
from t
where date1='1900/01/01'
and (date2,groupID) = (Select date2,GroupID
                       from t
                       group by (date2,GroupID)
                       having count(*) >1)

NOTE. Since Date1, Date2, GroupID form a unique key, check if your design is correct so that Date1 is NULL. You might have a genuine case where Date 1 is different for two strings, whereas (date2, GroupID) is the same

+1

bkm 25 Aug '09 at 5:28

source to share

If I understand correctly, you are looking for a group of IDs that have the same GroupID and Date2, one Date1 event other than 1900/01/01, and all the other Date1 events are 1900/01/01.

If I understood correctly, here is the request for you:

SELECT T.ID 
FROM Table T1
WHERE 

(T1.GroupID, T1.Date2) IN
    (SELECT T2.GroupID, T2.Date2
    WHERE T2.Date1 = '1900/01/01' OR
        T2.Date IS NULL
    GROUP BY T2.GroupID, T2.Date2)

AND 

1 >= 
(
    SELECT COUNT(*) 
    FROM TABLE T3
    WHERE NOT (T3.Date1 = '1900/01/01') 
    AND NOT (T3.Date1 IS NULL)
    AND T3.GroupID = T1.GroupID
    AND T3.Date2 = T1.Date2
)

Hope it helps.

+1

Roee adler 25 Aug '09 at 5:33

source to share

There may be a verification limitation.

Something along the lines select count(*) where date1 = '1900/01/01' and date2 = @date2 and groupid = @groupid

.

Just need to see if you can do it in a table level constraint ....

0

LRE 25 Aug '09 at 5:24

source to share

Besides having a PRIMARY KEY field defined on the table, you can also add other UNIQUE constraints to accomplish the same thing you are asking for. They will confirm that a particular column or set of columns has a unique value in the table.

Note the entry in the MySQL manual for an example:

http://dev.mysql.com/doc/refman/5.1/en/create-table.html

0

Brent nash 25 Aug '09 at 5:27

source to share

select * from table a
join (
select Date2, GroupID, Count(*)
from table
group by Date2, GroupID
having count(*) > 1
) b on (a.Date2 = b.Date2 and a.GroupID = b.GroupID)
where a.Date1 = '1900/01/01'

0

wgpubs 25 Aug '09 at 5:32

source to share

This is the easiest way to do it:

SELECT DISTINCT t1.*
FROM t t1 JOIN t t2 USING (date2, groupid)
WHERE t1.date1 = '1900/01/01';

Don't need to use GROUP BY

, which doesn't work well on some brands of database.

0

Bill karwin 25 Aug '09 at 5:42

source to share

SquareCog · Accepted Answer · 2009-08-25T05:33:14+0000

bkm looks like it, but internal selection may not work well in some databases. This is more straight forward:

select t1.* from 
t as t1 left join t as t2 
on (t1.date2=t2.date2 and t1.groupid=t2.groupid)
where t1.id != t2.id and (t1.date1='1900/01/01' or t2.date2='1900/01/01')

SQL Find Possible Duplicates

More articles: