Std :: u32string converting to / from std :: string and std :: u16string

I need to convert between UTF-8, UTF-16 and UTF-32 for different APIs / modules, and as I know there is a possibility to use C ++ 11 am looking at new string types.

Looks like I can use string

, u16string

and u32string

for UTF-8, UTF-16 and UTF-32. I also found codecvt_utf8

and codecvt_utf16

who want to do a conversion between char

or char16_t

and char32_t

and which looks like a higher level wstring_convert

, but it only works with bytes / std::string

and not much documentation.

I wanted to somehow use wstring_convert

for UTF-16 ↔ UTF-32 and UTF-8 ↔ UTF-32? I only actually found examples for UTF-8 to UTF-16 that I'm not even sure which would be correct on Linux where it is wchar_t

usually seen as UTF-32 ... Or do something more complicated with these codecvt things directly ?

Or is it just still not usable and should I stick with my existing small routines using 8, 16 and 32 bit unsigned integers?

+3


source to share


1 answer


If you read the documentation on CppReference.com for wstring_convert

, codecvt_utf8

, codecvt_utf16

and codecvt_utf8_utf16

, the page includes a table that you specify exactly what you can use for a variety of conversion UTF.

table

And yes, you would use std::wstring_convert

to facilitate conversion between different UTFs. Despite its name, it is not limited only std::wstring

, it actually works with any type std::basic_string

(on which it is based std::string

, std::wstring

and std::uXXstring

).



The std :: wstring_convert class template does conversions between a byte string std::string

and a wide string std::basic_string<Elem>

using a custom fax codecvt conversion. std :: wstring_convert assumes ownership of the conversion phase and cannot use a language version driven facet. Standard edges suitable for use with std :: wstring_convert are std :: codecvt_utf8 for UTF-8 / UCS2 and UTF-8 / UCS4 conversions and std :: codecvt_utf8_utf16 for UTF-8 / UTF-16 conversions .

For example:

typedef std::string u8string;

u8string To_UTF8(const std::u16string &s)
{
    std::wstring_convert<std::codecvt_utf8_utf16<char16_t>, char16_t> conv;
    return conv.to_bytes(s);
}

u8string To_UTF8(const std::u32string &s)
{
    std::wstring_convert<std::codecvt_utf8<char32_t>, char32_t> conv;
    return conv.to_bytes(s);
}

std::u16string To_UTF16(const u8string &s)
{
    std::wstring_convert<std::codecvt_utf8_utf16<char16_t>, char16_t> conv;
    return conv.from_bytes(s);
}

std::u16string To_UTF16(const std::u32string &s)
{
    std::wstring_convert<std::codecvt_utf16<char32_t>, char32_t> conv;
    std::string bytes = conv.to_bytes(s);
    return std::u16string(reinterpret_cast<const char16_t*>(bytes.c_str()), bytes.length()/sizeof(char16_t));
}

std::u32string To_UTF32(const u8string &s)
{
    std::wstring_convert<codecvt_utf8<char32_t>, char32_t> conv;
    return conv.from_bytes(s);
}

std::u32string To_UTF32(const std::u16string &s)
{
    const char16_t *pData = s.c_str();
    std::wstring_convert<std::codecvt_utf16<char32_t>, char32_t> conv;
    return conv.from_bytes(reinterpret_cast<const char*>(pData), reinterpret_cast<const char*>(pData+s.length()));
}

      

+15


source







All Articles