Working with Chinese Characters in C-Manipulation

Question

Working with Chinese Characters in C-Manipulation

It is known that in C a string is represented by an array char

s.

And on most 32-bit processors, it char

takes one byte or eight bits. And the string consists of an array of one byte

s.

Since extended characters like Chinese and Japanese take up more than 8 bits, I am a little confused about this.

For example, I tested that I can define an array of Chinese characters in the same way as an array of English letters using the type syntax char array[100]

. So my question is:

Is there a mechanism that tries to bridge the gap between common 8-bit characters and characters more than 8-bit so that they are treated the same way like what I mentioned above.

+3

c string encoding cjk

xiaohan2012 18 March 12 at 7:09

source to share

2 answers

Ofir · Answer 1 · 2012-03-18T07:43:24+0000

Yes, using multibyte character encodings. This is a pretty broad question, but start with this:

wchar
Unicode
UTF-8 (which allows you to manipulate strings with char functions).

user529758 · Answer 2 · 2012-03-18T07:53:03+0000

I would suggest using the UTF8 lowercase encoding, as this allows normal (byte <= 127) characters to be used normally, and in addition, you will be able to use two, three, or four byte characters by defining a Unicode control character (byte> = 128) ... You can also use libiconv for some related problems. http://www.gnu.org/software/libiconv/

Working with Chinese Characters in C-Manipulation

More articles: