T-SQL Convert unicode to emoji
Hooooookay so I know almost nothing about it, but it was a fun distraction from work and I hope you get what you need.
This emoji character, as you can see from your Unicode links, is actually three characters aligned together. The first for MAN , the second for ZERO WIDTH JOINER and the third BOY . The effect of zero-width merging is to make the other two characters act as one when moving around the page or selecting text. You can see it crash in any text editor that doesn't support (like SSMS) where your cursor will "pause" between the MAN and BOY characters for a single direction key press.
So, to answer your question, I assumed that either all your Unicode values ββare sequences of three and that the middle one is a joiner or, if it isn't, you can work from here.
Starting with this very informative answer , you will see that SQL Server has a little incomplete handling of extra characters. Hence, you need to either change the collation of the database, or give her a helping hand, namely tell her whether to split the Unicode character into two characters nchar
or not. Since I am assuming your sequences are all Emoji-Joiner-Emoji
, this is not too much of a problem for me, but it might be for you.
First, we need to split your character sequence into its component parts, for which I use Jeff Moden's based table splitting function :
create function [dbo].[StringSplit]
(
@str nvarchar(4000) = ' ' -- String to split.
,@delimiter as nvarchar(1) = ',' -- Delimiting value to split on.
,@num as int = null -- Which value to return.
)
returns @results table(ItemNumber int, Item nvarchar(4000))
as
begin
declare @return nvarchar(4000);
-- Handle null @str values
select @str = case when len(isnull(@str,'')) = 0 then '' else @str end;
-- Start tally table with 10 rows.
with n(n) as (select n from (values(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n(n))
-- Select the same number of rows as characters in @str as incremental row numbers.
-- Cross joins increase exponentially to a max possible 10,000 rows to cover largest @str length.
,t(t) as (select top (select len(@str) a) row_number() over (order by (select null)) from n n1,n n2,n n3,n n4)
-- Return the position of every value that follows the specified delimiter.
,s(s) as (select 1 union all select t+1 from t where substring(@str,t,1) = @delimiter)
-- Return the start and length of every value, to use in the SUBSTRING function.
-- ISNULL/NULLIF combo handles the last value where there is no delimiter at the end of the string.
,l(s,l) as (select s,isnull(nullif(charindex(@delimiter,@str,s),0)-s,4000) from s)
insert into @results
select rn as ItemNumber
,Item
from(select row_number() over(order by s) as rn
,substring(@str,s,l) as item
from l
) a
where rn = @num
or @num is null;
return;
end
Using this function, we can split the unicode sequence into 3 parts and manually expand the data into 3 columns. Following the explanation in the SO answer above, since the value of the CodePoint
two emojis (calculated by convert(int,(convert(varbinary(max),replace('<Your Uxxxxxxxxx unicode value>','U','0x'),1)))
the script part below) is between 65536
and 1114111
, we need to find High Surrogate and Low Surrogate, but since this is not necessary for the Zero Width Joiner, we just need the binary the view was passed to the function nchar
(note the lack of conversion to int
):
declare @s nvarchar(50) = '\U0001F468\U0000200D\U0001F466';
select nchar(55232+(i1/1024)) + nchar(56320+(i1%1024)) -- MAN emoji
+nchar(b2) -- JOINER
+nchar(55232+(i3/1024)) + nchar(56320+(i3%1024)) -- BOY emoji
as Emoji
from(select convert(int,(convert(varbinary(max),replace(s1.Item,'U','0x'),1))) as i1
,convert(varbinary(max),replace(s2.Item,'U','0x'),1) as b2
,convert(int,(convert(varbinary(max),replace(s3.Item,'U','0x'),1))) as i3
from stringsplit(@s,'\',2) as s1
,stringsplit(@s,'\',3) as s2
,stringsplit(@s,'\',4) as s3
) as a;
Putting all these meanings nchar
together, we get the correct symbolic representation of your emoji:
Output
Emoji
-----
π¨βπ¦
source to share