Join multiple nvarchar columns

I have a table like this:

CREATE TABLE [dbo].[Table](
    [Id] [INT] IDENTITY(1,1) NOT NULL,
    [A] [NVARCHAR](150) NULL,
    [B] [NVARCHAR](150) NULL,
    [C] [NVARCHAR](150) NULL,
    [D] [NVARCHAR](150) NULL,
    [E] [NVARCHAR](150) NULL,
 CONSTRAINT [con] PRIMARY KEY CLUSTERED 
(
    [Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

      

and look at performance improvements to join this table.

Option 1 - Concatenate the entire string into an nvarchar primary key, then do:

Source.[A] + Source.[B] + Source.[C] + Source.[D] + Source.[E] = Table.PKString

      

As far as I know, this is bad practice.

Option 2 - Usage:

Source.[A] + Source.[B] + Source.[C] + Source.[D] + Source.[E] = Target.[A] + Target.[B] + Target.[C] + Target.[D] + Target.[E]

      

Option 3 - Usage:

Source.[A] = Target.[A] And
...
Source.[E] = Target.[E]

      

+3


source to share


1 answer


Your option 1 will not work correctly, as it will be treated ('ab','c')

like ('a','bc')

.

Also your columns are null and concatenate null nulls.

You cannot concatenate all columns into a primary key nvarchar

due to an error, and even without this you still risk a failure, as the maximum length will be 1500 bytes, which is significantly larger than the maximum key column size of the key.

For similar length reasons, a composite index that uses all columns will also not work.

You can create a computed column that uses all of these 5 column values ​​as input to compute the checksum or hash value and index, however.



ALTER TABLE [dbo].[Table]
  ADD HashValue AS CAST(hashbytes('SHA1', ISNULL([A], '') + ISNULL([B], '')+ ISNULL([C], '')+ ISNULL([D], '')+ ISNULL([E], '')) AS VARBINARY(20));


CREATE INDEX ix
  ON [dbo].[Table](HashValue)
  INCLUDE ([A], [B], [C], [D], [E]) 

      

Then use this in conjunction with the residual predicate on the other 5 columns in case of hash collisions.

If you want to NULL

compare the same values, you could use

SELECT *
FROM   [dbo].[Table1] source
       JOIN [dbo].[Table2] target
         ON source.HashValue = target.HashValue
            AND EXISTS(SELECT source.A,
                              source.B,
                              source.C,
                              source.D,
                              source.E
                       INTERSECT
                       SELECT target.A,
                              target.B,
                              target.C,
                              target.D,
                              target.E) 

      

Note that the index created above basically reproduces the entire table, so you may need to create clustering instead if your queries need to be closed.

+5


source