Python C-API: how to pass UNICODE UTF-16 null terminated C string to my python app without converting to UTF-8?

pythonists,

I am trying to write a Python extension in C that feeds a large number of C-encoded UNICODE UTF-16 null terminated strings to my Python application. UNICODE strings from my C library guarantee they are always 16 bits. I am NOT using wchar_t in my C library on LINUX due to the wchar_t size can vary.

I found many functions (PyUnicode_AsUTF8String, PyString_FromStringAndSize, PyString_FromString, etc.) that do exactly what I want, but these functions are all for 8-bit character / string representation.

The Python documentation (http://docs.python.org/howto/unicode.html) says:

"Under the hood, Python represents Unicode strings as 16 or 32 bit integers, depending on how the Python interpreter is compiled."

I really want to avoid the performance penalty of converting all UTF-16 C strings to UTF-8 C strings just for Python interface purposes, especially on Windows if the Python interpreter uses 16-bit "under the hood" like Well.

Any idea how to fix this issue is much appreciated.

Thank you Thomas

+3


source to share


1 answer


You cannot avoid copying data (unless you break through the Python C API), but you can create Python unicode objects directly from UTF-16 data using PyUnicode_DecodeUTF16

; see http://docs.python.org/c-api/unicode.html#utf-16-codecs .



+2


source







All Articles