Removing duplicate records in MySQL table
I have a table with several thousand rows. The table contains two columns, name
and email
. I have multiple duplicate lines, for example:
- John Smith | john@smith.com
- John Smith | john@smith.com
- Erica Smith | erica@smith.com
- Erica Smith | erica@smith.com
What would be the easiest way to remove all duplicate results. For example, to make the table contents = SELECT name, DISTINCT(email) FROM table
.
source to share
You could easily do this by selecting this query to a different table and then renaming it to replace the original.
CREATE TABLE `table2` (
`name` varchar(255),
`email` varchar(255),
UNIQUE KEY `email` (`email`));
INSERT INTO `table2` SELECT `name`, DISTINCT(`email`) FROM `table`;
RENAME TABLE `table` TO `table1`;
RENAME TABLE `table2` TO `table`;
Note that this one CREATE
needs to be adjusted to your actual table format. I added a unique key to the email field as a suggestion on how to prevent duplicates in the first place.
Alternatively, you can loop
DELETE FROM `table`
WHERE `email` IN (
SELECT `email` FROM `table` GROUP BY `email` HAVING count(*) > 1
) LIMIT 1
To delete one duplicate entry per call. The importance of the constraint is not to delete both lines for any duplicate
source to share
Add auto increment to the table. I believe that when you add it, it will be filled in for you. Since MySql does not allow deletion based on a subquery on the same table, the easiest solution is to then dump the entire dataset into an enticing one for processing. Assuming you have called the new RowId field and temp table tempTable, you can use the following code:
DELETE FROM NameAndEmail
LEFT JOIN
( SELECT name, email, Max(RowId) as MaxRowId
FROM temptable
GROUP BY name, email
) as MaxId
WHERE NameAndEmail.Email = MaxId.Email
and NameAndEmail.Name = MaxId.Name
and NameAndEmail.RowId <> MaxId.RowId
source to share
Add a unique index
The easiest way to clean up a table with duplicate data is to simply add a unique index:
set session old_alter_table=1;
ALTER IGNORE TABLE `table` ADD UNIQUE INDEX (name, email);
Pay special attention to the first sql statement, without it the IGNORE flag is ignored and the alter table statement will fail.
source to share