Mysql - remove duplicates

I have a table with a barcode column with a unique index. The data has been loaded with extra characters (-xx) at the end of each barcode to prevent duplication, but there will be many duplicates once I disable the suffix. Here's some sample data:

itemnumber  barcode

17912       2-14
18082       2-1
21870       2-10
29219       2-8

      

Then I created two temporary tables, marty and manny, both with item number and separator barcodes. So both tables will contain

itemnumber  barcode

17912       2
18082       2
21870       2
29219       2

      

etc.

And I tried to delete everything but the first entry with barcode "2" in marty table (and every other barcode). I was hoping that now update the original table with the correct first record so that users can set the duplicates in the application on their own.

So this is my request to delete everything but the first record in the marty table for each barcode

DELETE FROM marty
  WHERE itemnumber NOT IN
    (SELECT MIN(itemnumber) FROM manny GROUP BY barcode)

      

There are 130,000 ranks in martyrs and courageous. The request took over 24 hours and then didn't finish correctly. The connection to the server crashed and the request did not complete all updates.

Is there a better way to approach this if there wasn't a subquery that I think is causing the delay? And the group is probably slowing down too many entries.

thank

+3


source to share


4 answers


Another option: This option works without temporary tables to remove duplicates:



 Delete m1
 From Marty m1
 join Marty m2 
    on m1.barcode = m2.barcode 
    and m1.itemnumber > m2.itemnumber

      

+2


source


MySQL is known to be slow when used IN

with very large sets. Alternative scenario:

Use a script to build a long sentence itemnumber = X OR itemnumber = y OR itemnumber = z

(chunk size ~ 1000) and INSERT

matched rows (i.e. those that would not have been DELETE

d in the previous query) into a new table that TRUNCATE

exists, and load the contents of the new table back into the old one with INSERT INTO marty SELECT * FROM marty_tmp

.

You can lock the table or perform the transaction for the final TRUNCATE

, INSERT

.



change

  • Request SELECT MIN(itemnumber) FROM manny GROUP BY barcode

    from script, stores the result in the file of the desired array of ItemNumbers
  • Take a batch of 1000 the desired menu items and make this request: INSERT INTO manny_tmp SELECT * FROM manny WHERE itemnumber = desiredItemNumbers[0] OR itemnumber = desiredItemNumbers[1] ...

    . Repeat this query until you run out of the desired array of ItemNumbers (nb, the last query will probably contain less than 1000 of the desired numbers).
  • You now have a table with the results that would have been left if you had stayed DELETE

    d, so change the contents of the tables marty

    and marty_tmp

    .
  • TRUNCATE marty

  • INSERT INTO marty SELECT * FROM marty_tmp

+1


source


Here's a two step approach to avoid using NOT IN

. It also does not use the "manny" temporary table. First concatenate "marty" with itself to highlight the lines for which itemnumber! = Min (itemnumber). Use UPDATE

to set barcode

for these lines to NULL

. The second pass DELETE

then removes all lines marked in the first phase.

In this example, I've split column barcode

"marty" into two columns; this can be done with the table in its original format with some modification (you need to split the column values ​​on the fly).

select * from marty;
+------------+---------+---------+
| itemnumber | barcode | subcode |
+------------+---------+---------+
|      17912 |       2 |      14 |
|      18082 |       2 |       1 |
|      21870 |       2 |      10 |
|      29219 |       2 |       8 |
|      30133 |       3 |       5 |
|      30134 |       3 |       7 |
|      30139 |       3 |       9 |
|      30142 |       3 |      12 |
+------------+---------+---------+
8 rows in set (0.00 sec)

UPDATE
  (marty m1
   JOIN
     (SELECT barcode,
             MIN(itemnumber) AS itemnumber
      FROM marty
      GROUP BY barcode) m2
   USING(barcode))
SET m1.barcode = NULL WHERE m1.itemnumber != m2.itemnumber;

mysql> select * from marty;
+------------+---------+---------+
| itemnumber | barcode | subcode |
+------------+---------+---------+
|      17912 |       2 |      14 |
|      18082 |    NULL |       1 |
|      21870 |    NULL |      10 |
|      29219 |    NULL |       8 |
|      30133 |       3 |       5 |
|      30134 |    NULL |       7 |
|      30139 |    NULL |       9 |
|      30142 |    NULL |      12 |
+------------+---------+---------+
8 rows in set (0.00 sec)

DELETE FROM marty WHERE barcode IS NULL;

      

+1


source


If you are creating temporary tables anyway, how about building the table with "INSERT INTO" or "CREATE TABLE .. AS ..." based on:

SELECT MIN(itemnumber) AS itemnumber, barcode
  FROM marty
  GROUP BY barcode

      

0


source







All Articles