Backup scheme for non-volatile C ++ map constructs

I often use std :: map or tr1 :: hashed_maps in my C ++ code. I have an upcoming project where I would normally use constructs like this by default, however in this project I have a requirement for such maps to be unstable. That is, after the termination of the application (both safely closed and accidentally killed, for example, a power outage), the card data must be safely saved to disk and restored after the subsequent execution of the application. Note that this is not a requirement that every bit of data until the power is turned off is stored just a few seconds before.

The requirements are still that the application must be high performance in terms of access and storage to cards. Obviously "high performance" is subjective, but millions of loads / storage will be mapped to the card per second.

This makes me "guess" that I should be using a SQL database, however I am not experiencing problems with databases and fear that a significant performance change will go from simple C ++ containers to a full SQL infrastructure. Will SQL "cache" memory work in such a way as to reduce the performance impact?

Alternatively, the simple answer might just be to frequently (say every 10-30 seconds), write (serialize) a copy of the cards to disk. Depending on the size of the maps, which are going to be large (at least a million records), this may be unwise.

Any recommendations?



source to share

5 answers

A simple C ++ approach is good if there are no plans to expand functionality in the future. A middle ground that might well suit your needs are key value stores like Redis or Cassandra . They handle storage and interruptions transparently and also improve storage across multiple machines if there aren't enough of them. Their performance is very good, in some cases they can even outperform C ++ code. A full blown SQL database will be too slow for your purposes unless you run it on multiple machines.



Use your best nail hammer, even though you like your C ++ hammer the most (I would be in the same boat).

It looks like the database will be the best choice in terms of performance and data integrity. They are designed to handle the scenarios that you describe in your post.

The two things I see you need to do are:

  • Create a robust database model for the information you want to store. I am not an expert on this, but I do know that this right is important.
  • Do some research on a good C ++ database wrapper. This way you can leave MySQL data in the library and you can focus on where you are strongest.


If there are millions of stores per map per second, SQL, quickly enough, becomes quite attractive to something that you seem to understand as an afterthought. A keystore might be better for your application, but if you really run into performance constraints, you can simply write a log of the updates you make to your in-memory store. You can restore the in-memory storage from the log to meet your persistence requirements.



You can still use your map wrapped in map object handling operations. Operations that modify the map will also modify the map in memory as well as update disk storage.

Then your next problem would be figuring out which storage model is most suitable for your data, eg. a sql database, or perhaps a log containing all updates, or perhaps a fixed size binary and its own indexing scheme.

If there is also a need to share the database between multiple users, each of whom may be updating it, then you will also need to add a mechanism to sync your map .... it might be easier by then to just query the query. But in any case, by that time you have an object that handles all operations on the data, and it will only need to replace the internal elements of this object.



I am not a user of your application, but storing large amounts of data as it closes is a bad idea for several reasons:

  • When someone wants to close an application, they want it to close pretty quickly, rather than hang up for ages. If he doesn't fall, they will probably kill him.

  • If they "fail", your data will not be saved.

Thus, periodical savings are a much better idea, and for that you probably want to mark rows as dirty when they are not currently saved, which means you may need more data to mark dirty records. This can be done with a simple set or vector of keys for which the data is dirty, and periodically save them and remove their dirty status.

When closed, you will be recording any remaining dirty recordings, but there won't be many.

A lot of this depends on how often your maps change.

Remember also that user interaction should always have the highest priority, and any commit of your dirty persistence members should happen on a low priority background thread.



All Articles