SQL Removing duplicates one row at a time

I have a table where I save all the row changes that have ever happened. The problem is that there was an error at the start of the application that was creating a bunch of copies of each line.

The table looks something like this:

copies
|ID |CID |DATA
| 1 | 1  |  DA
| 2 | 2  |  DO
| 2 | 3  |  DO (copy of CID 2)
| 1 | 4  |  DA (copy of CID 1)
| 2 | 5  |  DA
| 1 | 6  |  DA (copy of CID 1)
| 2 | 7  |  DO

      

CID UNIQUE in tabular copies.

I want to remove all duplicate DATA GROUP BY IDs that are sorted by CID.

As you can see in the table, CIDs 2 and 3 are the same and they are next to each other. I would like to remove CID 3. Same with CID 4 and CID 6; they have no ID 1 in between and are copies of CID 1.

After removing the duplicates, I would like the table to look like this:

copies
|ID |CID |DATA
| 1 | 1  |  DA
| 2 | 2  |  DO
| 2 | 5  |  DA
| 2 | 7  |  DO

      

Any suggestions?:)

I think my question was heavily asked because the answer that seems to be the best gives this result:

ID   | DATA | DATA | DATA | DATA | DATA |     DATA |        CID          |
                                                   |Expected |  Quassnoi |
1809 |    1 |    0 |    1 |    0 |    0 |     NULL |  252227 |    252227 |
1809 |    1 |    0 |    1 |    1 |    0 |     NULL |  381530 |    381530 |
1809 |    1 |    0 |    1 |    0 |    0 |     NULL |  438158 | (missing) |
1809 |    1 |    0 |    1 |    0 | 1535 | 20090113 |  581418 |    581418 |
1809 |    1 |    1 |    1 |    0 | 1535 | 20090113 |  581421 |    581421 |

      

CID 252227 AND CID 438158 are duplicates, but since CID 381530 is in between; I want to keep this. These are only duplicates, which are located directly one after another when ordering by CID and ID.

+2


source to share


4 answers


DELETE   c.*
FROM     copies c
JOIN     (
         SELECT  id, data, MIN(copies) AS minc
         FROM    copies
         GROUP BY
                 id, data
         ) q
ON       c.id = q.id
         AND c.data = q.data
         AND c.cid <> q.minc

      

Update:

DELETE  c.*
FROM    (
        SELECT  cid
        FROM    (
                SELECT  cid,
                        COALESCE(data1 = @data1 AND data2 = @data2, FALSE) AS dup,
                        @data1 := data1,
                        @data2 := data2
                FROM    (
                        SELECT  @data1 := NULL,
                                @data2 := NULL
                        ) vars, copies ci
                ORDER BY
                        id, cid
                ) qi
        WHERE   dup
        ) q
JOIN    copies c
ON      c.cid = q.cid

      

This is empoys solution of session MySQL

variables.



There is a clean solution ANSI

that will use NOT EXISTS

, however it will be slow due to the way the optimizer works MySQL

(it will not use the accessor range

in the correlated subquery).

See this article on my blog for details on performance for a fairly similar task:

+5


source


You can use count

in a subquery for this:



delete from copies
where
    (select count(*) from copies s where s.id = copies.id 
                                   and s.data = copies.data 
                                   and s.cid > copies.cid) > 0

      

+1


source


// EDITED for @Jonathan Leffler comment
//$sql = "SELECT ID,CID,DATA FROM copies ORDER BY CID, ID";
$sql = "SELECT ID,CID,DATA FROM copies ORDER BY ID, CID";
$result = mysql_query($sql, $link); 
$data = "";
$id = "";
while ($row = mysql_fetch_row($result)){ 
       if (($row[0]!=$id) && ($row[2]!=$data) && ($id!="")){
            $sql2 = "DELETE FROM copies WHERE CID=".$row[1];
            $res = mysql_query($sql2, $link); 
       }
       $id=$row[0];
       $data=$row[2];
} 

      

+1


source


remove from copies c where c.cid in (select max (cid) as max_cid, count (*) as num from copies where num> 1 group by id, data)

0


source







All Articles