Printing characters from a string gives different results

I am confused about UTF8 strings in D. Can anyone explain why this below code gives different options? why "abç"[2] == 'ç'

is there false

and not true

?

string s = "abç";
for(int i = 0; i < s.length; i++)
{
    dchar c = s[i];
    writefln("%#x", cast(int)c);
}
writeln();
foreach(dchar c; s)
{
    writefln("%#x", cast(int)c);
}

      

This code output:

enter image description here

+3


source to share


1 answer


The character ç

has a UNICODE code point greater than 7F (there is E7), so it is represented as more than one within a UTF8 string char

(is a C3 A7 pair)

s[2]

is only the third char

in s

(and the first char of 'ç')

Your first loop prints the "bytes" as they are. (taken as s [i]) Yout the second loop converts the code points in s to UTF32.



e7 and C3 A7 is just UTF32 and UTF8 encoding of the same character (U + 00E7).

For reference: http://www.fileformat.info/info/unicode/char/e7/index.htm

+4


source







All Articles