Potential memory leak when converting char width to string python

Question

Potential memory leak when converting char width to string python

I have the following code in cython in pyx file that converts wchar_t * to python string (unicode)

// All code below is python 2.7.4

cdef wc_to_pystr(wchar_t *buf):
    if buf == NULL:
        return None
    cdef size_t buflen
    buflen = wcslen(buf)
    cdef PyObject *p = PyUnicode_FromWideChar(buf, buflen)
    return <unicode>p

I called this function in a loop like this:

cdef wchar_t* buf = <wchar_t*>calloc(100, sizeof(wchar_t))
# ... copy some wide string to buf

for n in range(30000):
    u = wc_to_pystr(buf) #<== behaves as if its a memory leak

free(buf)

I tested this on Windows and the observation is that the memory (as seen from the task manager) keeps increasing and hence I suspect there may be a memory leak here.

This is surprising because:

As per my understanding of the API PyUnicode_FromWideChar () copies.
Each time the variable 'u' is assigned a different value, the previous value must be freed
Since the original buffer ('buf') remains as is and is only freed after the loop ends, I expected the memory to not grow after a certain point at all

Any idea where I am going wrong? Is there a better way to implement Wide Char for a python unicode object?

+3

python python-2.7 unicode cython

user2248790 07 dec. 14 at 17:05

source to share

1 answer

user2248790 · Accepted Answer · 2014-12-08T09:14:32+0000

solvable !! Decision:

(Note: the solution refers to a snippet of my code that was not in the question initially. I had no idea indicating that it would hold the key to fix this issue. Sorry for those who thought ...)

In the cython pyx file, I have declared the python API as follows:

PyObject* PyUnicode_FromWideChar(const wchar_t *w, Py_ssize_t size)

I checked the docs at https://github.com/cython/cython/blob/master/Cython/Includes/cpython/ init .pxd

I was declaring the return type as PyObject * and hence an additional ref was created that I did not explicitly consider. The solution was to change the return type in the signature, for example:

object PyUnicode_FromWideChar(const wchar_t *w, Py_ssize_t size)

As per the addition of docs, "object" as the return type does not increase the number of references and hence is freed in memory of the for loop correctly. The modified wc_to_pystr looks like this:

cdef wc_to_pystr(wchar_t *buf):
    if buf == NULL:
        return None
    cdef size_t buflen
    buflen = wcslen(buf)
    p = PyUnicode_FromWideChar(buf, buflen)
    return p

Potential memory leak when converting char width to string python

More articles: