Structure of a many-to-many relationship in SQL Server with or without an additional primary key column?
Let's assume we have two tables: Roles and Reports . And there is a many-to-many relationship between them. Of course the only solution I can think of is to create a crosstab, name it RoleReport . I see two approaches to the structure of this table:
1. Columns: RoleReportId, RoleId, ReportId
PK: RoleReportId
2. Columns: RoleId, ReportId
PK: RoleId, ReportId
Is there any real difference between the two (performance or whatever)?
source to share
You will need a composite UNIQUE
index ( RoleId, ReportId
) anyway .
There is no point in not doing this PRIMARY KEY
.
If you do this CLUSTERED PRIMARY KEY
(the default), it will be better in performance since it will be smaller.
Cluster primary key will contain only two columns in each entry: RoleID
and ReportID
, a secondary index contains three columns: RoleID
, ReportID
and RoleReportID
(as a string pointer).
You can create an additional index on ReportID
that can be used to find all Roles
for a given Report
.
It would be necessary to make a surrogate key for this relationship if the following two conditions were met:
- You have additional attributes in your relationship (i.e. this table contains additional columns, for example
Date
or something)- You have lots of tables that reference this relationship with
FOREIGN KEY
- You have lots of tables that reference this relationship with
In this case, it would be better to have one columnar PRIMARY KEY
reference in the FOREIGN KEY
relationship.
Since you don't have such a need, just create a composite one PRIMARY KEY
.
source to share
You don't really need the RoleReportId. It doesn't add anything to the relationship.
Many people try to avoid using naturally unique keys in real tables, opting for artificially unique instead, but I don't always agree with that. For example, if you can be sure that your SSN will never change, you can use that as a key. If this changes somehow in the future, you can fix it.
But I'm not going to argue this point, there are good arguments on both sides. However, in this case, you definitely don't need an artificially unique key, since both of your fields are and remain unique.
source to share
Semantically, the difference is what you use as the primary key.
I usually let the rest of my schema dictate what I do in this situation. If the cross-table is purely an implementation of many-to-many relationships, I tend to use a concatenated primary key. If I am gathering more information from the crosstab, making it an entity in its own right, I am more inclined to give it my ID, regardless of the two join tables.
This is, of course, subjective. I am not suggesting that this is the only true path (tm).
source to share
If you have many rows, it is useful to have ordered indexes on the RoleId and / or ReportId columns, as this will speed up the lookups - but on the contrary it will slow down the insert / delete operations. This is the classic problem of using a profile ...
Omit RoleReportId
PK unless otherwise required . It adds nothing to the relationship, forces the server to generate a useless number for every insert, and leaves the other two columns unordered, which slows down the lookup.
But in general we are talking about milliseconds here. This becomes relevant if there is a huge amount of data (say more than 10,000 rows) ...
source to share
The advantage of using RoleReportID as the primary key with one column comes when you (or another guy, depending on your company structure) need to write an interface that accesses individual ↔ report role relationships (for example, to remove one). At this point, you may prefer the fact that you only need to address one column instead of two in order to identify the binding entry.
Other than that, you don't need the RoleReportID column.
source to share