Working with Chinese Characters in C-Manipulation

It is known that in C a string is represented by an array char

s.

And on most 32-bit processors, it char

takes one byte or eight bits. And the string consists of an array of one byte

s.

Since extended characters like Chinese and Japanese take up more than 8 bits, I am a little confused about this.

For example, I tested that I can define an array of Chinese characters in the same way as an array of English letters using the type syntax char array[100]

. So my question is:

Is there a mechanism that tries to bridge the gap between common 8-bit characters and characters more than 8-bit so that they are treated the same way like what I mentioned above.

+3


source to share


2 answers


Yes, using multibyte character encodings. This is a pretty broad question, but start with this:



+3


source


I would suggest using the UTF8 lowercase encoding, as this allows normal (byte <= 127) characters to be used normally, and in addition, you will be able to use two, three, or four byte characters by defining a Unicode control character (byte> = 128) ... You can also use libiconv for some related problems. http://www.gnu.org/software/libiconv/



0


source







All Articles