Minimizing memory overhead using C ++ containers (std :: map and std :: vector are too expensive)

I expect that you will be processing a huge number of data records, resulting in about 20 uint8_t

having millions of pairs <int, struct>

associated with each one (ordered by int

). These pairs are fairly lightweight at ~ 10 bytes and require dynamic allocation.

I originally used std::map<uint8_t, std::vector<int, struct>>

, but after looking into the overhead associated with vectors, namely capacity()

in

3 machine words in total + sizeof(element)

*capacity()

as seen here ; capacity()

"usually occurs for twice the actual number of elements," which appears to be harmful.

I could use std :: map instead of a vector, however the ~ 32 bytes per node overhead also becomes very expensive for such light weight pairs.

I'm not familiar with Boost and other C ++ libraries, so I was wondering if anyone could suggest a solution where I could avoid manually allocating heap memory?


Edit: To clarify the following points in the comments, the stored structure will contain 3 shorts (to begin with) and no additional data structures. I expect the length vector

to be no more than 1.5 * 10 ^ 8 and understand that this goes up to ~ 1.4GB (thanks @dyp).

I believe the question is rather how to control the vector capacity()

in such a way that reallocation through reserve()

is minimized. I'm also not sure about the efficiency shrink_to_fit()

(C ++ 11)

+3


source to share


1 answer


Following on from @NielKirk's point about std :: vector <gt; instead of a keymap, with 256 possibilities, you can also consider std :: array <gt; (or even a C-style array) for keys.

Regarding std :: pair <int, struct> elements, the original implementation had them as members of std :: vector <std :: pair <int, struct →> and you said

I could use a std :: map instead of a vector, however the ~ 32 bytes per node overhead also becomes very expensive for such light weight pairs.

which means that the element of the int

element is unique since you didn't mention std :: multimap. You can take a look at Google sparsehash

( http://code.google.com/p/sparsehash/ ). On the project home page:

Implementing hash_map with maximum memory efficiency. 2 bits / overhead! The SparseHash library contains several hashmap implementations, including implementations that optimize space or speed.

These hash table implementations are similar in API to the SGI hash_map class and the tr1 unordered_map class, but with different performance characteristics. Easily replace hash_map or unordered_map with sparse_hash_map or dense_hash_map in C ++ code.

I have used it before and have never had a problem with it. Your keys uint8_t

can be indexed in the collection (std :: vector / std :: array / C-array) of KCH hashmaps. If you wanted, you could even define KCH as a collection of objects, each containing a hash map, so each KCH [i] can implement a convenient std::pair<int, struct>

object interface for that key. You will have a default "bad key" element for non-key items in the collection referencing: a) one empty dummy layout, or b) a "bad key object" that handles an unexpected key value appropriately.



Something like that:

typedef std::pair<int, struct>                            myPair;
typedef google::sparse_hash_map<int, myPair>              myCollectionType;
typedef google::sparse_hash_map<int, myPair>::iterator    myCollectionIter;

myCollectionType dummyHashMap;
std:array<myCollectionType, 256> keyedArray; 

      

Initialize all elements keyedArray

before dummyHashMap

and then populate with various hash maps for valid keys.

Likewise, with containing objects:

class KeyedCollectionHandler {
public:
    virtual bool whatever(parm);
    ...

private:
    myCollectionType collection;
};

class BadKeyHandler : public KeyedCollectionHandler 
{
public:
    virtual bool whatever(parm){
        // unknown or unexpected key, handle appropriately
    }
    ...
};

BadKeyHandler badKeyHandler;

      

Initialize 256 array elements with key to badKeyHandler

, fill KeyedCollectionHandler

objects for good key values.

+2


source







All Articles