NSString unicode encoding issue
I am having trouble converting a string to something readable. I use
NSString *substring = [NSString stringWithUTF8String:[symbol.data cStringUsingEncoding:NSUTF8StringEncoding]];
but I cant convert \ U7ab6 \ U51b1 to '
It shows like 窶 冱, which I don't want, it should display like "Can anyone help me?
source to share
shown as
This symbol is U + 2019 RIGHT SINGLE QUOTATION MARK.
What happened is that you had a sequence of characters ’s
presented to you in UTF-8 encoding, which is outputted as bytes:
’ s
E2 80 99 73
This sequence of bytes was then misinterpreted as if it was encoded in the Windows 932 code page (Japanese, more or less Shift-JIS):
E2 80 99 73
窶 冱
So, in this particular case, you can recover the string ’s
by first encoding the characters into cp932 bytes and then decoding those bytes back to characters using UTF-8.
However, this will not solve your real problem, namely that the lines were read wrong in the first place. You got 窶冱
in this case because the UTF-8 byte sequence that was encoded ’s
was also a valid Shift-JIS byte sequence. But that won't be the case for all possible UTF-8 byte sequences you can get. Many other characters will be irreparably crippled.
You need to find where the bytes are read into the system and decoded as Shift-JIS, and fix that to use UTF-8 instead.
source to share