Boost python: passing large data structure to python

I am currently implementing Python in my C ++ program using boost / python in order to use matplotlib. Now I am stuck at the point where I need to build a large data structure, say a dense matrix of 10000x10000. I want to plot the columns of this matrix and I figured I had several options:

  • Iterating and copying each value to a numpy array -> I don't want to do this for the obvious reason which doubles the memory consumption.
  • Iterating and exporting each value to a file than importing it in python -> I could do this completely without boost / python and I don't think this is a good way.
  • Highlight and save the matrix in Python and just update the values ​​from C ++ -> But as pointed out here it is not recommended to switch between Python Interpreter and my C ++ program
  • Somehow expose a matrix in python without copying it -> All I can find in this question is Python extension with C ++ classes, not nesting

Which one is the best option in terms of performance and memory consumption in a course or is there an even better way to accomplish such a task.

+3


source to share


2 answers


To prevent copying in Boost.Python, you can either:


If the matrix has contiguous C-style memory, consider using the Numpy C-API. The function PyArray_SimpleNewFromData()

can be used to create an ndarray object that wraps memory that has been allocated elsewhere. This will allow you to expose data in Python without requiring you to copy or move every element between languages. how to extend the documentation is a great resource for working with the Numpy C-API:

Sometimes you want to wrap memory allocated elsewhere in an ndarray object for later use. This procedure simplifies this. [...] New reference to ndarray is returned, but ndarray will not own its data. When this ndarray is freed, the pointer will not be freed.

[...] If you want to free memory as soon as the ndarray is freed, just set a flag OWNDATA

in the returned ndarray.



In addition, while the plotting function can create copies of the array, it can do so inside the C-API, allowing it to use a memory layout.


If performance is an issue, it might be worth considering the graph itself:

  • data sampling and plotting may be sufficient depending on the distribution of the data
  • using a raster based backend like Agg often comes out based on vector databases on large datasets
  • benchmarking other big data tools like Vispy
+3


source


Altough Tanner's answer gave me a big leap forward and I used Boost.NumPy , an unofficial extension of Boost.Python that could easily be added. It wraps the NumPy C API and makes it more convenient and usable.



+1


source







All Articles