Compare every record of one table with sql server and return duplicates
I have a table as shown below. I wanted to get duplicate records. If the condition is if date2 and date4 have the same dates or dates within less than or equal to 10 days of each other, then the entries are duplicated. I have about 2000 records in the database. An example is shown here with examples. Date1 can be ignored. It may be the same date or different.
ID Number Code Type date1 Date2 date3 Date4 status shortname CP deferred
1 EO2 C TO 9/20/2000 9/1/2010 9/18/2010 9/1/2010 Archi Ordinary 58.65586 0
2 EO2 C TO 9/20/2000 9/5/2010 9/18/2010 9/5/2010 Archi Ordinary 58.65586 0
3 EO2 C TO 9/21/2000 9/10/2010 9/18/2010 9/10/2010 Archi Ordinary 58.65586 0
4 EO2 C TO 9/21/2000 9/24/2010 9/18/2010 9/24/2010 Archi Ordinary 58.65586 0
I wrote below request:
select * from T a
join T b on a.ID = b.ID
where a.[Number] = b.[Number] and a.ID >1
Also, I linked this:
SELECT * FROM T a WHERE a.Id IN (SELECT b.Id FROM T b)
EXCEPT
SELECT * FROM T a
The problem is that I cannot find a way where each string can compare against each other with the date condition above. I should get a result similar to below as a duplicate:
Number Code Type date1 Date2 date3 Date4 status shortname CP deferred
EO2 C TO 9/20/2000 9/1/2010 9/18/2010 9/1/2010 Archi Ordinary 58.65586 0
EO2 C TO 9/20/2000 9/5/2010 9/18/2010 9/5/2010 Archi Ordinary 58.65586 0
EO2 C TO 9/21/2000 9/10/2010 9/18/2010 9/10/2010 Archi Ordinary 58.65586 0
Please help. Thanks.
source to share
This is how I solved it. Thanks for the help.
select t1.Number,t1.Code,t1.Type,t1.date1,t1.Date2,t1.date3,t1.Date4,t1.stats,t1.shortname,t1.CP, t1.deferred
from T t1 join T t2 on 1=1 and t2.Number = t1.Number and t2.Code = t1.Code and t2.Type = t1.Type and t2.deferred = t1.eferred and t2.CP = t1.CP and t2.status = t1.status and abs(datediff(day,t1.Date2,t2.Date2)) <=10 and abs(datediff(day,t1.date3,t2.date3)) <=10 and abs(datediff(day,t1.Date4,t2.Date4)) <=10 group by t1.Number, t1.Code,t1.Type , t1.date1, t1.Date2,t1.date3,t1.Date4 ,t1.status,t1.shortname,t1.CP, t1.deferred having count(*) > 1
source to share
You can use join
to find other identical strings. A join
tries to match all rows from the right table based on a condition on
. For example:
select *
from YourTable t1
join YourTable t2
on t1.ID < t2.ID -- Must be different rows
-- Smaller than presents the duplicates once
-- Otherwise you'd get both 1,3 and 3,1
and abs(datediff(day, t1.date2, t2.date2)) <= 10
and abs(datediff(day, t1.date4, t2.date4)) <= 10
and t1.Number = t2.Number
and t1.Code = t2.Code
and -- So on for every column that should be equal
To ignore a column, omit it from the condition on
.
source to share