Is there ever a good position to NOT use a primary key?

What I am currently doing involves converting multiple Excel sheets to MS SQL Server databases. Most of them are completely unrelated to each other and are not related to each other. And sometimes some fields may actually require NULL records.

Anyone who does database design have you ever run into a situation where it was okay NOT to use a primary key?

If not, what can I do in this situation?

+3


source to share


5 answers


In my opinion every database table should have a primary key. This is important when it comes to maintaining data. You can directly update and delete specific rows in the database.

Some databases support internal row identifiers that are visible to users. This is definitely a possible alternative to the primary key, but I prefer it to be explicitly defined even in these databases.



In addition, the primary keys of an integer identifier do the following:

  • They determine the order of insertion into the table.
  • They are a small optimization for joins that use a key.
  • They distinguish between records that would otherwise be duplicates.
  • They provide a watermarking mechanism to keep track of the last updated record.
+3


source


The problem with primary keys is not directly related to whether other attributes on the table allow NULLs, nor is it directly related to the need to bind the table to other relationships (although primary keys are used for such a join).

Rather, primary keys are for establishing and maintaining the identity of the objects represented by the rows in your table. In any application where you need to know what a real world "thing" is that a string refers to, or where one string with a set of values ​​is in no way identical or interchangeable with another string with the same values, then you need a primary key.



You don't need a primary key in a situation where your table is only used to generate generalized results, in which one source row is irrelevant. This covers a wide range of reports and analysis situations. The primary key doesn't hurt, but it doesn't make sense in this case.

You might want to especially avoid the primary key in an analytic situation where the data is an anonymous extract of a larger dataset. In this case, the absence of a primary key helps to ensure that the information cannot be traced back to the original source.

+1


source


Unless you have a way to uniquely identify each row in the original data; and you may ever be able to manipulate or retrieve a specific row from your data, then you can create an artificial primary key. For example. 'Entry_id'

The main problem I see in your example is importing data and then modifying it.

Tell me what you import

Name | Age | Favourite Colour
-----------------------------
Anne | 23  |  red
John | 34  |  blue
John | 34  |  blue

      

If you want to remove one from John, 34, blue

, how would you do it? Well, this is possible with some awkward code (I think you will have more than three columns.)

Delete top (1) 
from testPK 
where name='john' 
and age=34 
and favouriteColour = 'blue';

      

But if you have this

Entry_ID | Name | Age | Favourite Colour
----------------------------------------
10001    |Anne | 23  |  red
10002    |John | 34  |  blue
10003    |John | 34  |  blue

      

Then it's just like

Delete from Table where Entry_ID = 10003

      

+1


source


In my experience, there are many situations where you don't need to use a PC. Especially if you are importing some data from external sources, you can import everything in bulk into the staging architecture and then process the data and distribute it thereafter (ETL). It's better in terms of performance and deduplication, cleanup, etc.

Sometimes you can also use some vocabulary tables with a FREETEXT lookup, which also does not necessarily require a PK.

Most of the time, your worksheet has a PC for many reasons: performance, organization, etc.

0


source


In my early database design experience, I often left out primary keys, especially with data imported from other sources like your Excel sheets. And nothing terrible happened. But in retrospect, I was playing with fire and a lot could easily go wrong.

So I think the best answer to this question is to put it in your head: is there ever a situation where using a Primary Key would be a bad idea ? I can't think of a situation where the master key could cause the problem.

As far as converting Excel files, the approach I take is to directly import the Excel sheet as a table that only exists to store data until I put it into a "real" table for use in the database. I create a "real" table with a Primary Key IDENTITY

field + all the fields from an Excel sheet and use it INSERT INTO

to transfer the data. Like this:

CREATE TABLE real_table
    (
    Pkey int IDENTITY PRIMARY KEY not null
    , Column_A varchar(255) null
    , Column_B varchar(255) null
    )

INSERT INTO real_table(
    Column_A
    , Column_B)
select
    Column_A
    , Column_B
from Excel_import_table

      

0


source







All Articles