Structure of a many-to-many relationship in SQL Server with or without an additional primary key column?

Let's assume we have two tables: Roles and Reports . And there is a many-to-many relationship between them. Of course the only solution I can think of is to create a crosstab, name it RoleReport . I see two approaches to the structure of this table:

1. Columns: RoleReportId, RoleId, ReportId
   PK: RoleReportId
2. Columns: RoleId, ReportId
   PK: RoleId, ReportId

      

Is there any real difference between the two (performance or whatever)?

+2


source to share


7 replies


You will need a composite UNIQUE

index ( RoleId, ReportId

) anyway .

There is no point in not doing this PRIMARY KEY

.

If you do this CLUSTERED PRIMARY KEY

(the default), it will be better in performance since it will be smaller.

Cluster primary key will contain only two columns in each entry: RoleID

and ReportID

, a secondary index contains three columns: RoleID

, ReportID

and RoleReportID

(as a string pointer).

You can create an additional index on ReportID

that can be used to find all Roles

for a given Report

.



It would be necessary to make a surrogate key for this relationship if the following two conditions were met:

  • You have additional attributes in your relationship (i.e. this table contains additional columns, for example Date

    or something)
    • You have lots of tables that reference this relationship withFOREIGN KEY

In this case, it would be better to have one columnar PRIMARY KEY

reference in the FOREIGN KEY

relationship.

Since you don't have such a need, just create a composite one PRIMARY KEY

.

+10


source


You don't really need the RoleReportId. It doesn't add anything to the relationship.

Many people try to avoid using naturally unique keys in real tables, opting for artificially unique instead, but I don't always agree with that. For example, if you can be sure that your SSN will never change, you can use that as a key. If this changes somehow in the future, you can fix it.



But I'm not going to argue this point, there are good arguments on both sides. However, in this case, you definitely don't need an artificially unique key, since both of your fields are and remain unique.

+5


source


If you really need it RoleReportId

as a foreign key in some other table (usually you don't), skip to option 2. This will require less memory, and that alone is likely to give a performance advantage - plus why the column which you are never going to use?

+2


source


Semantically, the difference is what you use as the primary key.

I usually let the rest of my schema dictate what I do in this situation. If the cross-table is purely an implementation of many-to-many relationships, I tend to use a concatenated primary key. If I am gathering more information from the crosstab, making it an entity in its own right, I am more inclined to give it my ID, regardless of the two join tables.

This is, of course, subjective. I am not suggesting that this is the only true path (tm).

+2


source


If you have many rows, it is useful to have ordered indexes on the RoleId and / or ReportId columns, as this will speed up the lookups - but on the contrary it will slow down the insert / delete operations. This is the classic problem of using a profile ...

Omit RoleReportId

PK unless otherwise required . It adds nothing to the relationship, forces the server to generate a useless number for every insert, and leaves the other two columns unordered, which slows down the lookup.

But in general we are talking about milliseconds here. This becomes relevant if there is a huge amount of data (say more than 10,000 rows) ...

+1


source


I would suggest du to choose no PK for your second choice. You can use indexes or a unique constraint on the combination of both columns.

0


source


The advantage of using RoleReportID as the primary key with one column comes when you (or another guy, depending on your company structure) need to write an interface that accesses individual ↔ report role relationships (for example, to remove one). At this point, you may prefer the fact that you only need to address one column instead of two in order to identify the binding entry.

Other than that, you don't need the RoleReportID column.

0


source







All Articles