Unicode and network communication
I am planning to develop a Windows based client application and platform agnostic server application. The client application basically sends messages to the server application. The client application can send messages in English or other languages. Should I use UNICODE to encode messages in my client application? What is the general practice among networking applications? My client and server application will use a dedicated messaging protocol over TCP / IP. What UNICODE encoding does Windows and UNIX support by default? Should I change the encoding type in my protocol and for decoding UNICODE messages? Please advise.
You can use whatever encoding you want, you just need to be careful with things like byte order. Windows internally uses UTF-16 (little-endian), so if you expect most systems to be Windows then you should probably go for that. Otherwise, I would recommend UTF-8, which has no byte ordering issues to worry about.
If you go with UTF-16 (or UTF-32, which I definitely don't recommend), be clear about what the finiteness of the data on the wire is. Then, for every client that reads or writes a Unicode character to a network socket, transforms from platform root enthusiasm to a network entity - this is either a non-operational or a byte swap.
source to share