Python C-API: how to pass UNICODE UTF-16 null terminated C string to my python app without converting to UTF-8?
pythonists,
I am trying to write a Python extension in C that feeds a large number of C-encoded UNICODE UTF-16 null terminated strings to my Python application. UNICODE strings from my C library guarantee they are always 16 bits. I am NOT using wchar_t in my C library on LINUX due to the wchar_t size can vary.
I found many functions (PyUnicode_AsUTF8String, PyString_FromStringAndSize, PyString_FromString, etc.) that do exactly what I want, but these functions are all for 8-bit character / string representation.
The Python documentation (http://docs.python.org/howto/unicode.html) says:
"Under the hood, Python represents Unicode strings as 16 or 32 bit integers, depending on how the Python interpreter is compiled."
I really want to avoid the performance penalty of converting all UTF-16 C strings to UTF-8 C strings just for Python interface purposes, especially on Windows if the Python interpreter uses 16-bit "under the hood" like Well.
Any idea how to fix this issue is much appreciated.
Thank you Thomas
source to share
You cannot avoid copying data (unless you break through the Python C API), but you can create Python unicode objects directly from UTF-16 data using PyUnicode_DecodeUTF16
; see http://docs.python.org/c-api/unicode.html#utf-16-codecs .
source to share