Sharing memory across threads?

I have an application that works with multiple processing threads from a todo queue. I have no influence on what gets queued and in what order (it is loaded externally by the user). One work item from the queue can take from a few seconds to several hours of work and should not be interrupted during processing. In addition, a single work item can consume from several megabytes to 2 GB of memory. Memory consumption is my problem. I am running as a 64 bit process on an 8 gig machine with 8 parallel threads. If each of them hits the worst work item, at the same time I run out of memory. I am wondering what is the best way to get around this.

  • schedule conservatively and only run 4 threads. The worst case shouldn't be a problem anymore, but we're wasting a lot of parallelism by making the average case much slower.
  • Before starting to work on a new element, each thread check checks the available memory (or rather the total allocated memory by all threads). Just start when there is more than 2GB of storage left. Check back periodically, hoping that other threads finish their hogs with memory and we can start over time.
  • try to predict how many memory items from the queue will be required (difficult) and plan accordingly. We could change the order of the queue (override the user's choice), or simply adjust the number of worker threads running.
  • more ideas?

I am currently aiming for number 2 because it seems simple to implement and solve most cases. However, I'm still wondering what are the standard ways to deal with situations like this? The operating system should do something very similar at the process level after all ...

Respectfully,

Sören

      

+2


source to share


3 answers


I continued the discussion on Herb Sutter's blog and provoked some very helpful comments from readers. If interested, go to Sutter Mill .

Thanks for all the suggestions!



Sören

      

+1


source


So your current worst case memory usage is 16GB. With just 8GB of RAM, you're lucky to have 6 or 7GB left after the OS and system processes take their share. So on average, you are already going to cheat memory on a moderately loaded system. How many cores does a machine have? Do you have 8 worker threads because it is an 8 core machine?

Basically you can either decrease memory consumption or increase available memory. Your option 1, running only 4 threads, is unacceptably using CPU resources that can cut your throughput in half - definitely not optimal.

Option 2 is possible but risky. Memory management is very complex and asking for available memory does not guarantee that you will be able to proceed and allocate that amount (without causing paging). A disk I / O batch can cause the system to increase the cache size, a background process can start and replace its working set, and any number of other factors. For these reasons, the less available memory, the less you can rely on it. In addition, memory fragmentation can cause problems over time.



Option 3 is interesting, but can easily lead to CPU underutilization. If you have a job with high memory requirements, you can only run a few threads and be in the same situation as option 1, where you load kernels.

So when adopting a "reduce consumption" strategy, do you really need to have all the data in memory at once? Depending on the algorithm and data access pattern (for example, random or sequential), you can gradually load data. More esoteric approaches might involve compression, depending on your data and algorithm (but in reality, this is probably a waste of effort).

Then "increase the available memory". From a price / performance standpoint, you should seriously consider buying more RAM. Sometimes investing in more equipment is cheaper than development time to achieve the same end result. For example, you can put in 32GB of RAM for a few hundred dollars and it will immediately improve performance without adding complexity to the solution. With performance disabled, you can profile the application to see where you can make the software more efficient.

+3


source


It's hard to come up with solutions without knowing exactly what you are doing, but what about considering:

  • See if your processing algorithm can access data in smaller partitions without loading the entire work item into memory.
  • Consider developing a service solution so that the work is done by another process (perhaps a web service). This way you can scale your solution to run multiple servers, perhaps using a load balancer to distribute work.
  • Do you save incoming work items to disk before processing them? If not, they probably should be anyway, especially if it may be a while before the processor gets to them.
  • Is memory usage proportional to the size of the incoming work item, or is it easy to calculate? Knowing this will help you decide how to plan your treatment.

Hope this helps ?!

0


source







All Articles