NSString unicode encoding issue

Question

NSString unicode encoding issue

I am having trouble converting a string to something readable. I use

NSString *substring = [NSString stringWithUTF8String:[symbol.data cStringUsingEncoding:NSUTF8StringEncoding]];

but I cant convert \ U7ab6 \ U51b1 to '

It shows like 窶冱, which I don't want, it should display like "Can anyone help me?

0

iphone encoding unicode nsstring

munchine 27 Mar 11 at 5:18 am

source to share

1 answer

bobince · Accepted Answer · 2011-03-27T23:41:08+0000

shown as

This symbol is U + 2019 RIGHT SINGLE QUOTATION MARK.

What happened is that you had a sequence of characters ’s

presented to you in UTF-8 encoding, which is outputted as bytes:

’          s
E2 80 99   73

This sequence of bytes was then misinterpreted as if it was encoded in the Windows 932 code page (Japanese, more or less Shift-JIS):

E2 80    99 73
窶        冱

So, in this particular case, you can recover the string ’s

by first encoding the characters into cp932 bytes and then decoding those bytes back to characters using UTF-8.

However, this will not solve your real problem, namely that the lines were read wrong in the first place. You got 窶冱

in this case because the UTF-8 byte sequence that was encoded ’s

was also a valid Shift-JIS byte sequence. But that won't be the case for all possible UTF-8 byte sequences you can get. Many other characters will be irreparably crippled.

You need to find where the bytes are read into the system and decoded as Shift-JIS, and fix that to use UTF-8 instead.

NSString unicode encoding issue

More articles: