How do I enforce a specific character encoding in Microsoft SQL Server?
I need the string to be encoded in the encoding of a known character. So far my research and testing using MS SQL Server has shown that the documented encoding is "UCS-2", however the actual encoding (on the server in question) is "UCS-2LE".
It doesn't seem very reliable. I would like the function to be ENCODE
in PERL, Node, or most, so that regardless of settings or settings changes, my hash function will work with a known input.
We can limit the HEX hash string, so in the worst case, we can manually match the 16 possible input characters to the correct bytes. Anyone have a recommendation on this?
Here's the PERL I'm using:
use Digest::SHA qw/sha256/;
use Encode qw/encode/;
$seed = 'DDFF5D36-F14D-495D-BAA6-3688786D6CFA';
$string = '123456789';
$target = '57392CD6A5192B6185C5999EB23D240BB7CEFD26E377D904F6FEF262ED176F97';
$encoded = encode('UCS-2LE', $seed.$string);
$sha256 = uc(unpack("H*", sha256($encoded)));
print "$target\n$sha256\n";
Which corresponds to MS SQL:
HASHBYTES('SHA_256', 'DDFF5D36-F14D-495D-BAA6-3688786D6CFA123456789')
But I really want:
HASHBYTES('SHA_256', ENCODE('UCS2-LE', 'DDFF5D36-F14D-495D-BAA6-3688786D6CFA123456789'))
So, no matter which MS SQL will encode the input string as HASHBYTES
will always work with a known byte array.
source to share
SQL Server uses UCS-2 only for columns, variables, and literals that have been declared as nvarchar
. In all other cases, it uses 8-bit ASCII with the encoding of the current database, unless otherwise stated (e.g. using a sentence collate
).
So, you need to specify a Unicode literal:
select HASHBYTES('SHA_256', N'DDFF5D36-F14D-495D-BAA6-3688786D6CFA123456789');
Or you can use a variable column or datatype table nvarchar
:
-- Variable
declare @var nvarchar(128) = N'DDFF5D36-F14D-495D-BAA6-3688786D6CFA123456789';
select HASHBYTES('SHA_256', @var);
-- Table column
declare @t table(
Value nvarchar(128)
);
insert into @t
select @var;
select HASHBYTES('SHA_256', t.Value)
from @t t;
PS Of course, since Wintel is a platform with small terms, SQL Server uses the same encoding version as OS / hardware. Unless something new arrives in SQL Server 2017, there is no way to get a representation of a large number in this universe natively.
source to share