Removing duplicate records in MySQL table

I have a table with several thousand rows. The table contains two columns, name

and email

. I have multiple duplicate lines, for example:

  • John Smith | john@smith.com
  • John Smith | john@smith.com
  • Erica Smith | erica@smith.com
  • Erica Smith | erica@smith.com

What would be the easiest way to remove all duplicate results. For example, to make the table contents = SELECT name, DISTINCT(email) FROM table

.

+3


source to share


5 answers


You could easily do this by selecting this query to a different table and then renaming it to replace the original.

CREATE TABLE `table2` (
  `name` varchar(255), 
  `email` varchar(255), 
  UNIQUE KEY `email` (`email`));
INSERT INTO `table2` SELECT `name`, DISTINCT(`email`) FROM `table`;
RENAME TABLE `table` TO `table1`;
RENAME TABLE `table2` TO `table`;

      

Note that this one CREATE

needs to be adjusted to your actual table format. I added a unique key to the email field as a suggestion on how to prevent duplicates in the first place.



Alternatively, you can loop

DELETE FROM `table` 
WHERE `email` IN (
  SELECT `email` FROM `table` GROUP BY `email` HAVING count(*) > 1
) LIMIT 1

      

To delete one duplicate entry per call. The importance of the constraint is not to delete both lines for any duplicate

+6


source


The easiest way is to copy all the different values โ€‹โ€‹into a new table:



select distinct *
into NewTable
from MyTable

      

+2


source


DELETE FROM table
WHERE id 
NOT IN
(SELECT A.id
FROM 
(
SELECT name,MAX(id) AS id
FROM table
GROUP BY name
) A
)

      

+1


source


Add auto increment to the table. I believe that when you add it, it will be filled in for you. Since MySql does not allow deletion based on a subquery on the same table, the easiest solution is to then dump the entire dataset into an enticing one for processing. Assuming you have called the new RowId field and temp table tempTable, you can use the following code:

DELETE FROM NameAndEmail
LEFT JOIN 
(     SELECT name, email, Max(RowId) as MaxRowId 
      FROM temptable 
      GROUP BY name, email
) as MaxId
WHERE NameAndEmail.Email = MaxId.Email
     and NameAndEmail.Name = MaxId.Name
     and NameAndEmail.RowId <> MaxId.RowId

      

+1


source


Add a unique index

The easiest way to clean up a table with duplicate data is to simply add a unique index:

set session old_alter_table=1;
ALTER IGNORE TABLE `table` ADD UNIQUE INDEX (name, email);

      

Pay special attention to the first sql statement, without it the IGNORE flag is ignored and the alter table statement will fail.

+1


source







All Articles