NSString unicode encoding issue

I am having trouble converting a string to something readable. I use

NSString *substring = [NSString stringWithUTF8String:[symbol.data cStringUsingEncoding:NSUTF8StringEncoding]];

      

but I cant convert \ U7ab6 \ U51b1 to '

It shows like 窶 冱, which I don't want, it should display like "Can anyone help me?

0


source to share


1 answer


shown as

This symbol is U + 2019 RIGHT SINGLE QUOTATION MARK.

What happened is that you had a sequence of characters ’s

presented to you in UTF-8 encoding, which is outputted as bytes:

          s
E2 80 99   73

      

This sequence of bytes was then misinterpreted as if it was encoded in the Windows 932 code page (Japanese, more or less Shift-JIS):



E2 80    99 73
窶        冱

      

So, in this particular case, you can recover the string ’s

by first encoding the characters into cp932 bytes and then decoding those bytes back to characters using UTF-8.

However, this will not solve your real problem, namely that the lines were read wrong in the first place. You got 窶冱

in this case because the UTF-8 byte sequence that was encoded ’s

was also a valid Shift-JIS byte sequence. But that won't be the case for all possible UTF-8 byte sequences you can get. Many other characters will be irreparably crippled.

You need to find where the bytes are read into the system and decoded as Shift-JIS, and fix that to use UTF-8 instead.

+3


source







All Articles