Printing characters from a string gives different results
I am confused about UTF8 strings in D. Can anyone explain why this below code gives different options? why "abç"[2] == 'ç'
is there false
and not true
?
string s = "abç";
for(int i = 0; i < s.length; i++)
{
dchar c = s[i];
writefln("%#x", cast(int)c);
}
writeln();
foreach(dchar c; s)
{
writefln("%#x", cast(int)c);
}
This code output:
source to share
The character ç
has a UNICODE code point greater than 7F (there is E7), so it is represented as more than one within a UTF8 string char
(is a C3 A7 pair)
s[2]
is only the third char
in s
(and the first char of 'ç')
Your first loop prints the "bytes" as they are. (taken as s [i]) Yout the second loop converts the code points in s to UTF32.
e7 and C3 A7 is just UTF32 and UTF8 encoding of the same character (U + 00E7).
For reference: http://www.fileformat.info/info/unicode/char/e7/index.htm
source to share