UOF-8 Compatible IO Streams

Does the standard library GCC or Boost or some other library support iostream-compatible versions ifstream

or ofstream

that support conversion between UTF-8 (file-) and std::vector<wchar_t>

or encoded streams std::wstring

?

+2


source to share


2 answers


Your question doesn't quite work. UTF-8 is a specific encoding and wchar_t

is a data type. It is also wchar_t

intended by the standard to represent the system character set, but this is entirely left by the platform and the standard does not require any requirements.

Hence, the right thing to ask for is first of all a conversion between narrow, multibyte system encoding and fixed length system encoding to wide string. This functionality is provided by std::mbstowcs

and std::wcstombs

. There might also be a language edge somewhere that wraps this up, but that's a bit of a library niche area.

If you want to convert between the opaque "system encoding" prescribed by the standard and a specific encoding prescribed by your serialized data source / sink, you need an additional library. I would recommend Posix iconv()

which is widely available. (The Windows API takes a different approach and offers special conversion functions.)



C ++ 11 eases the problem a bit by adding an explicit family of UTF-encoded string types and literals, and presumably also transcoding tools among them (although I've never seen anyone implemented them).

Here is my standard answer from past posts on the topic: Q1 , Q2 , Q3 . C ++ 11 will be a joy once it's fully available :-)

+2


source


The C ++ 11 solution is to wrap the UTF-8 stream in a suitable wbuffer_convert

#include <fstream>
#include <string>
#include <codecvt>
int main()
{
    std::ifstream utf8file("test.txt"); // if the file holds UTF-8 data
    std::wbuffer_convert<std::codecvt_utf8<wchar_t>> conv(utf8file.rdbuf());
    std::wistream ucsbuf(&conv);
    std::wstring line;
    getline(ucsbuf, line); // then line holds UCS2 or UCS4, depending on the OS
}

      



This works with Visual Studio 2010 and clang ++ / libc ++, but unfortunately not with GCC.

Until this becomes widespread, third party libraries are indeed the best solution.

+4


source







All Articles