Join multiple nvarchar columns
I have a table like this:
CREATE TABLE [dbo].[Table](
[Id] [INT] IDENTITY(1,1) NOT NULL,
[A] [NVARCHAR](150) NULL,
[B] [NVARCHAR](150) NULL,
[C] [NVARCHAR](150) NULL,
[D] [NVARCHAR](150) NULL,
[E] [NVARCHAR](150) NULL,
CONSTRAINT [con] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
and look at performance improvements to join this table.
Option 1 - Concatenate the entire string into an nvarchar primary key, then do:
Source.[A] + Source.[B] + Source.[C] + Source.[D] + Source.[E] = Table.PKString
As far as I know, this is bad practice.
Option 2 - Usage:
Source.[A] + Source.[B] + Source.[C] + Source.[D] + Source.[E] = Target.[A] + Target.[B] + Target.[C] + Target.[D] + Target.[E]
Option 3 - Usage:
Source.[A] = Target.[A] And
...
Source.[E] = Target.[E]
source to share
Your option 1 will not work correctly, as it will be treated ('ab','c')
like ('a','bc')
.
Also your columns are null and concatenate null nulls.
You cannot concatenate all columns into a primary key nvarchar
due to an error, and even without this you still risk a failure, as the maximum length will be 1500 bytes, which is significantly larger than the maximum key column size of the key.
For similar length reasons, a composite index that uses all columns will also not work.
You can create a computed column that uses all of these 5 column values ββas input to compute the checksum or hash value and index, however.
ALTER TABLE [dbo].[Table]
ADD HashValue AS CAST(hashbytes('SHA1', ISNULL([A], '') + ISNULL([B], '')+ ISNULL([C], '')+ ISNULL([D], '')+ ISNULL([E], '')) AS VARBINARY(20));
CREATE INDEX ix
ON [dbo].[Table](HashValue)
INCLUDE ([A], [B], [C], [D], [E])
Then use this in conjunction with the residual predicate on the other 5 columns in case of hash collisions.
If you want to NULL
compare the same values, you could use
SELECT *
FROM [dbo].[Table1] source
JOIN [dbo].[Table2] target
ON source.HashValue = target.HashValue
AND EXISTS(SELECT source.A,
source.B,
source.C,
source.D,
source.E
INTERSECT
SELECT target.A,
target.B,
target.C,
target.D,
target.E)
Note that the index created above basically reproduces the entire table, so you may need to create clustering instead if your queries need to be closed.
source to share