Unicode and network communication

I am planning to develop a Windows based client application and platform agnostic server application. The client application basically sends messages to the server application. The client application can send messages in English or other languages. Should I use UNICODE to encode messages in my client application? What is the general practice among networking applications? My client and server application will use a dedicated messaging protocol over TCP / IP. What UNICODE encoding does Windows and UNIX support by default? Should I change the encoding type in my protocol and for decoding UNICODE messages? Please advise.

+1


source to share


3 answers


Look for UTF-8, unicode encoding in 8-bit bytes, effective for English and Western languages.

It's always a good idea to swap the encoding type if you want to support something else at a later stage.



UTF-8 is supported by all major OS: es and computer languages.

+3


source


If you are in control of both server and client, I would pick one coding type and stick with it.



I would suggest either UTF-8 (most efficient for English and Western languages) or UTF-16 (be sure to choose byte order).

+1


source


You can use whatever encoding you want, you just need to be careful with things like byte order. Windows internally uses UTF-16 (little-endian), so if you expect most systems to be Windows then you should probably go for that. Otherwise, I would recommend UTF-8, which has no byte ordering issues to worry about.

If you go with UTF-16 (or UTF-32, which I definitely don't recommend), be clear about what the finiteness of the data on the wire is. Then, for every client that reads or writes a Unicode character to a network socket, transforms from platform root enthusiasm to a network entity - this is either a non-operational or a byte swap.

0


source







All Articles