Remove duplicate numbers Status exists?
How do I remove a duplicate to a new table?
For example, in the table:
id | number | status | statusdate
1 | 12121312 | Processing | 01/06/2015 12:32:00
2 | 12121312 | Processing | 02/06/2015 13:37:00
3 | 12121312 | Processing | 03/06/2015 14:29:00
4 | 99999999 | Processing | 03/06/2015 15:33:00
5 | 99999999 | Completed | 03/06/2015 18:39:00
6 | 99999999 | Completed | 04/06/2015 19:39:00
-
There
id
is a duplicate number between 1 and 3, but there is no Completed status, so get the number from the oldest date -
There
id
is a duplicate number between 4 and 6, but a Completed status exists. Get number from newest date
In a new table, it should look something like this:
id | number | status | statusdate
1 | 12121312 | Processing | 01/06/2015 12:32:00
6 | 99999999 | Completed | 04/06/2015 19:39:00
How can I do that?
source to share
Here's an unoptimized version:
DELETE FROM table_name as T
WHERE (
number in (SELECT number
FROM table_name
WHERE status = 'Completed')
AND statusdate < (SELECT MAX(statusdate)
FROM table_name
GROUP BY number
HAVING number = T.number)
)
OR
(
number in (SELECT number
FROM table_name
WHERE status = 'Processing')
AND statusdate > (SELECT MIN(statusdate)
FROM table_name
GROUP BY number
HAVING number = T.number)
);
These are the entries it removes:
id number status statusdate
2 12121312 Processing February, 06 2015 13:37:00
3 12121312 Processing March, 06 2015 14:29:00
4 99999999 Processing March, 06 2015 15:33:00
5 99999999 Completed March, 06 2015 18:39:00
6 99999999 Completed April, 06 2015 19:39:00
Just replace SELECT * FROM
with DELETE FROM
in the Fiddle .
source to share
To select records without duplicates:
select t1.*
from tableName t1 JOIN
tablename t2 on t1.number=t2.number
and t1.status>=t2.status
and t1.statusdate>t2.statusdate
group by t1.status
order by id
SQL Fiddle result
To remove duplicates from a table:
delete from tableName
where id NOT IN
(select t1.id
from tableName t1 JOIN
tablename t2 on t1.number=t2.number
and t1.status>=t2.status
and t1.statusdate>t2.statusdate
group by t1.status) as Tbl
source to share
Assuming the items are added to the database in chronological order, this will give you a list of the ids you need:
SELECT COALESCE(MAX(IF(status='Completed',id,NULL)),MIN(IF(status='Processing',id,NULL)))
FROM yourtable
GROUP BY number;
Then you can select the following lines:
SELECT * FROM yourtable
WHERE id IN (select coalesce(max(if(status='Completed',id,null)),min(if(status='Processing',id,null)))
from yourtable
group by number);
Or remove others if needed.
source to share