Memory leak in Python extension when array is created with PyArray_SimpleNewFromData () and returned

Question

Memory leak in Python extension when array is created with PyArray_SimpleNewFromData () and returned

I wrote a simple Python plugin to simulate a 3-bit A / D converter. It is supposed to accept a floating point array as its input to return an array of the same size. The output actually consists of quantized input numbers. Here is my (simplified) module:

static PyObject *adc3(PyObject *self, PyObject *args) {
  PyArrayObject *inArray = NULL, *outArray = NULL;
  double *pinp = NULL, *pout = NULL;
  npy_intp nelem;
  int dims[1], i, j;

  /* Get arguments:  */
  if (!PyArg_ParseTuple(args, "O:adc3", &inArray))
    return NULL;

  nelem = PyArray_DIM(inArray,0); /* size of the input array */
  pout = (double *) malloc(nelem*sizeof(double));
  pinp = (double *) PyArray_DATA(inArray);

  /*   ADC action   */
  for (i = 0; i < nelem; i++) {
    if (pinp[i] >= -0.5) {
    if      (pinp[i] < 0.5)   pout[i] = 0;
    else if (pinp[i] < 1.5)   pout[i] = 1;
    else if (pinp[i] < 2.5)   pout[i] = 2;
    else if (pinp[i] < 3.5)   pout[i] = 3;
    else                      pout[i] = 4;
    }
    else {
    if      (pinp[i] >= -1.5) pout[i] = -1;
    else if (pinp[i] >= -2.5) pout[i] = -2;
    else if (pinp[i] >= -3.5) pout[i] = -3;
    else                      pout[i] = -4;
    }
  }

  dims[0] = nelem;

  outArray = (PyArrayObject *)
               PyArray_SimpleNewFromData(1, dims, NPY_DOUBLE, pout);
  //Py_INCREF(outArray);

  return PyArray_Return(outArray); 
} 

/* ==== methods table ====================== */
static PyMethodDef mwa_methods[] = {
  {"adc", adc, METH_VARARGS, "n-bit Analog-to-Digital Converter (ADC)"},
  {NULL, NULL, 0, NULL}
};

/* ==== Initialize ====================== */
PyMODINIT_FUNC initmwa()  {
    Py_InitModule("mwa", mwa_methods);
    import_array();  // for NumPy
}

I expected that if reference counting is handled correctly, Python garbage collection (often enough) will free the memory used by the output array if it has the same name and is reused. So I tested it on some dummy (but voluminous) data with this code:

for i in xrange(200): 
    a = rand(1000000)
    b = mwa.adc3(a)
    print i

Here the array named "b" is reused and its memory borrowed from the heap by adc3 () is expected to be returned to the system. I used gnome-system-monitor to check. Contrary to my expectations, the memory owned by python grew rapidly and could only be released after exiting the program (I am using IPython). For comparison, I tried the same procedure with the standard NumPy functions, zeros () and copy ():

for i in xrange(1000): 
    a = np.zeros(10000000)
    b = np.copy(a)
    print i

As you can see, the latter code does not create any memory build-up. I read a lot of texts in the standard documentation and on the internet, tried to use Py_INCREF (outArray) and not use it. All in vain: the problem continued.

However, I found a solution at http://wiki.scipy.org/Cookbook/C_Extensions/NumPy_arrays . The author provides the matsq () extension program that creates an array and returns it. When I tried to use the calls suggested by the author:

outArray = (PyArrayObject *) PyArray_FromDims(nd,dims,NPY_DOUBLE);
pout = (double *) outArray->data;

instead of mine

pout = (double *) malloc(nelem*sizeof(double));
outArray = (PyArrayObject *)
            PyArray_SimpleNewFromData(1, dims, NPY_DOUBLE, pout);
/* no matter with or without Py_INCREF(outArray)) */

the memory leak is gone! The program is working correctly.

Question: can anyone explain why PyArray_SimpleNewFromData () does not provide correct reference counting while PyArray_FromDims () does?

Many thanks.

Addition. I probably exceeded the number / time in the comments, so I'll add to my comment on Alex here. I tried to set the OWNDATA flag like this:

outArray->flags |= OWNDATA;

but i got error: "OWNDATA is not declared". The rest is in the comments. Thank you in advance.

SOLVED: correct flag setting

outArray->flags |= NPY_ARRAY_OWNDATA;

Now it works.

Alex, sorry.

+3

python arrays numpy memory-leaks

Benkevitch 12 jan. 15 at 23:21

source to share

1 answer

Alex martelli · Accepted Answer · 2015-01-12T23:36:21+0000

The problem is not with PyArray_SimpleNewFromData

, which produces the correctly recalculated PyObject*

. Rather, he is with your malloc

appointed pout

, then never free

d.

As the docs at http://docs.scipy.org/doc/numpy/user/c-info.how-to-extend.html clearly state, documenting PyArray_SimpleNewFromData

:

ndarray

will not own their data. When it ndarray

is deallocated, the pointer will not be freed .... If you want the memory to be deallocated as soon as ndarray

deallocated, then just set the flag OWNDATA

in the return ndarray

.

(my emphasis is on not ). IOW, you make sure that the "will not be released" behavior is so well documented, and you do not take much action if you want to avoid this behavior.

Memory leak in Python extension when array is created with PyArray_SimpleNewFromData () and returned

More articles: