Mysql - remove duplicates
I have a table with a barcode column with a unique index. The data has been loaded with extra characters (-xx) at the end of each barcode to prevent duplication, but there will be many duplicates once I disable the suffix. Here's some sample data:
itemnumber barcode
17912 2-14
18082 2-1
21870 2-10
29219 2-8
Then I created two temporary tables, marty and manny, both with item number and separator barcodes. So both tables will contain
itemnumber barcode
17912 2
18082 2
21870 2
29219 2
etc.
And I tried to delete everything but the first entry with barcode "2" in marty table (and every other barcode). I was hoping that now update the original table with the correct first record so that users can set the duplicates in the application on their own.
So this is my request to delete everything but the first record in the marty table for each barcode
DELETE FROM marty
WHERE itemnumber NOT IN
(SELECT MIN(itemnumber) FROM manny GROUP BY barcode)
There are 130,000 ranks in martyrs and courageous. The request took over 24 hours and then didn't finish correctly. The connection to the server crashed and the request did not complete all updates.
Is there a better way to approach this if there wasn't a subquery that I think is causing the delay? And the group is probably slowing down too many entries.
thank
source to share
MySQL is known to be slow when used IN
with very large sets. Alternative scenario:
Use a script to build a long sentence itemnumber = X OR itemnumber = y OR itemnumber = z
(chunk size ~ 1000) and INSERT
matched rows (i.e. those that would not have been DELETE
d in the previous query) into a new table that TRUNCATE
exists, and load the contents of the new table back into the old one with INSERT INTO marty SELECT * FROM marty_tmp
.
You can lock the table or perform the transaction for the final TRUNCATE
, INSERT
.
change
- Request
SELECT MIN(itemnumber) FROM manny GROUP BY barcode
from script, stores the result in the file of the desired array of ItemNumbers - Take a batch of 1000 the desired menu items and make this request:
INSERT INTO manny_tmp SELECT * FROM manny WHERE itemnumber = desiredItemNumbers[0] OR itemnumber = desiredItemNumbers[1] ...
. Repeat this query until you run out of the desired array of ItemNumbers (nb, the last query will probably contain less than 1000 of the desired numbers). - You now have a table with the results that would have been left if you had stayed
DELETE
d, so change the contents of the tablesmarty
andmarty_tmp
. -
TRUNCATE marty
-
INSERT INTO marty SELECT * FROM marty_tmp
source to share
Here's a two step approach to avoid using NOT IN
. It also does not use the "manny" temporary table. First concatenate "marty" with itself to highlight the lines for which itemnumber! = Min (itemnumber). Use UPDATE
to set barcode
for these lines to NULL
. The second pass DELETE
then removes all lines marked in the first phase.
In this example, I've split column barcode
"marty" into two columns; this can be done with the table in its original format with some modification (you need to split the column values ββon the fly).
select * from marty;
+------------+---------+---------+
| itemnumber | barcode | subcode |
+------------+---------+---------+
| 17912 | 2 | 14 |
| 18082 | 2 | 1 |
| 21870 | 2 | 10 |
| 29219 | 2 | 8 |
| 30133 | 3 | 5 |
| 30134 | 3 | 7 |
| 30139 | 3 | 9 |
| 30142 | 3 | 12 |
+------------+---------+---------+
8 rows in set (0.00 sec)
UPDATE
(marty m1
JOIN
(SELECT barcode,
MIN(itemnumber) AS itemnumber
FROM marty
GROUP BY barcode) m2
USING(barcode))
SET m1.barcode = NULL WHERE m1.itemnumber != m2.itemnumber;
mysql> select * from marty;
+------------+---------+---------+
| itemnumber | barcode | subcode |
+------------+---------+---------+
| 17912 | 2 | 14 |
| 18082 | NULL | 1 |
| 21870 | NULL | 10 |
| 29219 | NULL | 8 |
| 30133 | 3 | 5 |
| 30134 | NULL | 7 |
| 30139 | NULL | 9 |
| 30142 | NULL | 12 |
+------------+---------+---------+
8 rows in set (0.00 sec)
DELETE FROM marty WHERE barcode IS NULL;
source to share