Not enough memory in C ++: write to file instead, read data if needed?

Question

Not enough memory in C ++: write to file instead, read data if needed?

I am developing a tool for wavelet image analysis and machine learning on Linux machines in C ++. It is limited by the size of the images, the number of scales and the corresponding filters (up to 2048x2048 doubles) for each of the N orientations, as well as additional memory and processing load by the machine learning algorithm.

Unfortunately, my Linux systems programming skills are shallow at best, so I am not currently using swap, but the figure should be somehow possible?

I need to store the imaginary and real part of the filtered images of each scale and orientation, as well as the corresponding wavelets for reconstruction purposes. I keep them in memory for extra speed for small images.

As for memory usage: I already

store everything no more than once,
only what is needed
cut any double entries or reservations,
only by reference,
use pointers over temporary objects,
free memory as soon as it is not needed, and
limit the number of calculations to the absolute minimum.

As with most data processing tools, speed is in essence. As long as there is enough memory, the tool is about 3 times faster compared to the same implementation in Matlab code.

But once I have lost my memory, nothing else happens. Unfortunately, most of the images I train the algorithm on are huge (raw data is 4096x4096 double entries, even more after symmetric padding), so I hit the ceiling quite often.

Would it be bad practice to temporarily write data that is not needed for the current computation / processing step from memory to disk?

What approach / data format would be most appropriate for this?
I was thinking about using quickXML to read and write XML to a binary file and then only read the required data. Will this work?
Is the memory mapped file what I need? https://en.wikipedia.org/wiki/Memory-mapped_file

I know this will lead to a loss of performance, but more importantly, the software runs smoothly and does not freeze.

I know there are libraries out there that can do wavelet image analysis, so please share "Why reinvent the wheel, just use XYZ instead." I am using very specific splashes, I have to do it myself and I shouldn't use external libraries.

+3

c ++ linux memory

mmoment 07 jul. 15 at 16:42

source to share

3 answers

Lukas Boersma · Answer 1 · 2015-07-07T16:59:32+0000

Yes, writing data to disk to save memory is bad practice.

Generally, there is no need to manually write data to disk to save memory, unless you reach the limits of what you can address (4 GB on 32-bit machines, much more on 64-bit machines).

The reason for this is that the OS is already doing the same. It is very possible that your own solution will be slower than what the OS is doing. Read this Wikipedia article if you are not familiar with the concept of paging and virtual memory.

Laurent michel · Answer 2 · 2015-07-07T18:27:45+0000

You've learned about using mmap and munmap to bring images (and temporary results) into the address space and discard them when you no longer need them. mmap allows you to display the contents of a file directly in memory. no more fread / fwrite. Direct memory access. The write to the memory area is also written back to the file, and then returning this intermediate state later is no more difficult than reworking mmap.

Great benefits:

no encoding in bloated format like XML
great for transient results like matrices that are represented in contiguous regions of memory.
Dead is easy to realize.
Fully delegate to the OS the decision of when to change and change.

simon · Answer 3 · 2015-07-07T19:27:53+0000

This doesn't solve your main problem, but: Are you sure you need to do everything in double precision? You might not be able to use integer ratio bursts, but storing the image data itself in dual-local games is usually quite wasteful. Also, the 4k images are not very large ... I am assuming you are actually using frames of some kind, so you have redundant records, otherwise your numbers don't seem to add up (and you rarely save them? ). or maybe you are just using a large amount at once.

As for "should I write to disc"? This can help, especially if you are getting 4x (or more) magnification while accepting double precision image data. You can answer it yourself, at least just measure the load time and compare it with your computation time to see if it is worth doing. The wavelet itself should be very cheap, so I assume you are mostly dominating your learning algorithm. In this case, go ahead and ditch the original data or whatever until you need it.

Not enough memory in C ++: write to file instead, read data if needed?

More articles: