Checking for duplicates before inserting into SQL database

Question

Checking for duplicates before inserting into SQL database

So I was doing some research and I need to write an instruction INSERT

to insert unique client names into a table on my server. However, the standard database standard already has thousands of clients in it, and when we insert new clients, we need to check if they exist before adding it to the system.

My question is what would be the best / fastest way to do this? Would it be better to run a simple select query on the customers table (ordered by ASC) and do a binary search or something on the results, or maybe just execute a SQL query like the one below?

IF NOT EXISTS (SELECT 1 FROM clients AS c WHERE c.clientname = ?)
BEGIN
  INSERT INTO clients (clientname, address, ...)
  VALUES (?, ?, ...)
END

Is this a slow expression? I may need to run the insert several hundred times per view.

+3

sql

SNpn 11 Feb At 11:20 pm

source to share

2 answers

It is not too unusual to calculate the cost of SQL in a query in terms of disk operations (this usually means that a block read / write (usually 8KB) is a unit for your cost). (In-memory-DB should change something in this line of thought).

If you have hundreds, possible thousands of items and each item ... Say 20 bytes, then your complete database will probably fit in one block on disk (400 items / block). Maybe he needs a couple more blocks, but hurray: it's a subtle small number. With such a small database, your database will likely reside in your database memory cache, and you only have to pay for write access. As your database grows, the number of available access units can be exponentially reduced if you have an index.

Both Bill's solution and Bill's solution will not invoke write access if the item is already present in the database, and therefore should be equally fast.

The interesting part would be:

I may have to run the insert several hundred times per view.

This means that you can write the same block on disk hundreds of times. It would be faster if you could do it in one step. However, this is indeed a problem as I am not aware of any SQL function that allows this behavior. MySQL INSERT offers a way to specify multiple values in one status. This MAY be a significant plus (I don't know how clever MySQL handles this situation), but it is MySQL specific and not portable.

Another way to speed up your work is not to wait for the blocks you changed to be written to disk. This carries the risk of unannounced data loss, but can be a significant performance improvement. Again, this applies to the DBMS in use. For example. if you are using MySQL with InnoDB, you can set an option innodb_flush_log_at_trx_commit=0

in my.ini to archive this behavior.

It would be better to run a simple select query on the customers table (ASC ordered) and do a binary search or whatever in the results

This would unnecessarily copy large amounts of data from your DBMS to the client (possibly on different computers, exchanging a network protocol). This will still be ok for your small DB, but it doesn't scale well. It can only be useful if it helps you save data in one operation to disk.

+2

yankee 11 Feb 13:55 pm

source to share

Bill karwin · Accepted Answer · 2013-02-11T23:23:15+0000

The standard advice is to create a UNIQUE constraint if you want a given column to be unique.

ALTER TABLE clients ADD UNIQUE KEY (clientname);

Then try doing INSERT and it will succeed if there is no matching row, and it will fail if there is a duplicate. No SELECT required.

Checking for duplicates before inserting into SQL database

More articles: