Performing primary key selection

We have a table with PK CreateDateTime. The program that does the inserts is multithreaded. Obviously this is not thread safety. There is a 3 column composite key that can be made from existing data in a table. This is a log table and traffic is expected to be high. The main concern is productivity.

One of our developers thinks that because SQL internally stores date and time (Epoch?), DateTime is the optimal PK for performance reasons.

I suggested to create a composite key with three columns. Will this key change negatively impact performance?

It has also been suggested to use the PK GUID. Again, performance is questionable.

Is it better to switch to composite? If so, how can we address / explain these performance issues?

+3


source to share


1 answer


GUIDs might seem like a natural choice for your PRIMARY KEY - and if you really should, you can probably argue to use it for the PRIMARY KEY of a table. What I strongly recommended not to do is use the GUID column as the cluster key , which SQL Server does by default, unless you specify it otherwise.

You really need to leave two problems:

1) primary key is a boolean construct - one of the candidate keys that uniquely and reliably identifies every row in your table. It can be anything, in fact - INT, GUID, string - choose what matters the most to your scenario.

2) the clustering key (the column or columns that define the "clustered index" on the table) is the physical storage and here a small, stable, ever-executing data type is your best choice - INT or BIGINT as the default option.

By default, the primary key in a SQL Server table is also used as the clustering key, but this is not necessarily the case! I have personally seen significant performance gains when decaying the previous GUID based primary / cluster key into two separate keys - the primary (logical) key in the GUID and the clustering (ordering) key on a separate INT IDENTITY (1, 1).

As Kimberly Tripp - Indexing Queen - and others have stated many times - the GUID, as the clustering key is not optimal, as its randomness would result in massive page and index fragmentation and generally poor performance.

Yes, I know - there newsequentialid()

in SQL Server 2005 and up - but even this is not truly and completely consistent and therefore also suffers from the same problems as the GUID - it is slightly less noticeable.

Then another problem arises: the clustering key in the table will be added to every record and for every non-clustered index on your table, so you really want to make sure this is as small as possible.Typically, an INT with 2+ billion rows should be enough for an overwhelming most tables - and compared to the GUID as the clustering key, you can save hundreds of megabytes of disk and server memory.



Quick calculation - using INT versus GUID as primary and clustered key:

  • Base table with 1'000'000 rows (3.8 MB vs.15.26 MB)
  • 6 non-clustered indexes (22.89 MB vs 91.55 MB).

TOTAL: 25 MB versus 106 MB - and that's just on one table!

Some more food for thought - great material from Kimberly Tripp - read it, read it again, digest it! This is truly a SQL Server indexing gospel.

Regarding the composite keys: the main drawback in my opinion is the fact that all tables that want to reference this composite PK must now have a composite FK with all columns defined in the composite PK.It becomes very cumbersome when you have many queries that need to join child tables to the table using these compound keys.

DATEITME

by itself doesn't seem like a good idea for a primary key - in SQL Server it DATETIME

has 3.33ms precision, so chances are good on a busy system - valued values ​​are not a good idea for a PC!

If performance is your highest priority, I definitely recommend using a single column INT IDENTITY

(even if it's a surrogate column) for your unique clustered primary key index.

+4


source







All Articles