Minimizing memory overhead using C ++ containers (std :: map and std :: vector are too expensive)
I expect that you will be processing a huge number of data records, resulting in about 20 uint8_t
having millions of pairs <int, struct>
associated with each one (ordered by int
). These pairs are fairly lightweight at ~ 10 bytes and require dynamic allocation.
I originally used std::map<uint8_t, std::vector<int, struct>>
, but after looking into the overhead associated with vectors, namely capacity()
in
3 machine words in total +
sizeof(element)
*capacity()
as seen here ; capacity()
"usually occurs for twice the actual number of elements," which appears to be harmful.
I could use std :: map instead of a vector, however the ~ 32 bytes per node overhead also becomes very expensive for such light weight pairs.
I'm not familiar with Boost and other C ++ libraries, so I was wondering if anyone could suggest a solution where I could avoid manually allocating heap memory?
Edit: To clarify the following points in the comments, the stored structure will contain 3 shorts (to begin with) and no additional data structures. I expect the length vector
to be no more than 1.5 * 10 ^ 8 and understand that this goes up to ~ 1.4GB (thanks @dyp).
I believe the question is rather how to control the vector capacity()
in such a way that reallocation through reserve()
is minimized. I'm also not sure about the efficiency shrink_to_fit()
(C ++ 11)
source to share
Following on from @NielKirk's point about std :: vector <gt; instead of a keymap, with 256 possibilities, you can also consider std :: array <gt; (or even a C-style array) for keys.
Regarding std :: pair <int, struct> elements, the original implementation had them as members of std :: vector <std :: pair <int, struct →> and you said
I could use a std :: map instead of a vector, however the ~ 32 bytes per node overhead also becomes very expensive for such light weight pairs.
which means that the element of the int
element is unique since you didn't mention std :: multimap. You can take a look at Google sparsehash
( http://code.google.com/p/sparsehash/ ). On the project home page:
Implementing hash_map with maximum memory efficiency. 2 bits / overhead! The SparseHash library contains several hashmap implementations, including implementations that optimize space or speed.
These hash table implementations are similar in API to the SGI hash_map class and the tr1 unordered_map class, but with different performance characteristics. Easily replace hash_map or unordered_map with sparse_hash_map or dense_hash_map in C ++ code.
I have used it before and have never had a problem with it. Your keys uint8_t
can be indexed in the collection (std :: vector / std :: array / C-array) of KCH hashmaps. If you wanted, you could even define KCH as a collection of objects, each containing a hash map, so each KCH [i] can implement a convenient std::pair<int, struct>
object interface for that key. You will have a default "bad key" element for non-key items in the collection referencing: a) one empty dummy layout, or b) a "bad key object" that handles an unexpected key value appropriately.
Something like that:
typedef std::pair<int, struct> myPair; typedef google::sparse_hash_map<int, myPair> myCollectionType; typedef google::sparse_hash_map<int, myPair>::iterator myCollectionIter; myCollectionType dummyHashMap; std:array<myCollectionType, 256> keyedArray;
Initialize all elements keyedArray
before dummyHashMap
and then populate with various hash maps for valid keys.
Likewise, with containing objects:
class KeyedCollectionHandler {
public:
virtual bool whatever(parm);
...
private:
myCollectionType collection;
};
class BadKeyHandler : public KeyedCollectionHandler
{
public:
virtual bool whatever(parm){
// unknown or unexpected key, handle appropriately
}
...
};
BadKeyHandler badKeyHandler;
Initialize 256 array elements with key to badKeyHandler
, fill KeyedCollectionHandler
objects for good key values.
source to share