Hashmap implementation in C ++ :: hashing for templated datatype

I've been using the STL unordered_map lately, and while it works well, I don't quite understand how the hashing function works, given that the datatype is specified as a template parameter. To better understand this data structure, I implemented my own little Hashmap class in C ++:

Hashmap interface:

#ifndef _HASHMAP_H_
#define _HASHMAP_H_

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <iostream>
#include <vector.h>


//Beginning of Hashmap class definition

template <class Key, class Value>
class Hashmap{
private:

int mappedElementCount;



public:
explicit Hashmap();
virtual ~Hashmap();


virtual void test();

virtual int hash(Key*);

int* getSize();

void putKVPair(Key*,Value*);

void clearMap();


//When we use these methods, we'll want a linear vector of keys and values to 
    //iterate over, so vector is good here
std::vector<Key>* getKeys();
std::vector<Value>* getValues();

}; //end hashmap class definition
#endif /*_HASHMAP_H_*/

      

Hashmap implementation:

#include "Hashmap.h"

template<class Key,class Value> Hashmap<Key,Value>::Hashmap(){
mappedElementCount = 0;
}
template<class Key,class Value> Hashmap<Key,Value>::~Hashmap(){
printf("\nDestroying the base Hashmap object!\n");
}

template<class Key,class Value> void Hashmap<Key,Value>::test(){
printf("The size of our Key is %i and the size of our Value is
    %i\n",sizeof(Key),sizeof(Value));
}


template<class Key,class Value> int Hashmap<Key,Value>::hash(Key* k_ptr){

    unsigned int hashval;

    /* we start our hash out at 0 */
    hashval = 0;

        //TODO: How do we generate a hash signature when we don't know what data type 
        //we're going to be working with?

    return hashval % mappedElementCount;

}

template<class Key,class Value> std::vector<Key>* Hashmap<Key,Value>::getKeys(){
//TODO: prepare a vector initialized with all Key objects and return it here
return keys;    
}

template<class Key,class Value> std::vector<Value>* Hashmap<Key,Value>::getValues(){
//TODO: prepare a vector initialized with all Value objects and return it here
return values;  
}

template<class Key,class Value> int* Hashmap<Key,Value>::getSize(){
return &mappedElementCount;
}

template<class Key,class Value> void Hashmap<Key,Value>::putKVPair(Key* k, Value* v){
    //TODO: implement hashing of the key object k to determine
    //the address of the value object v

    //first step, generate a hash from our key
    int tempHash = hash(k);

       //TODO: store the Value at an address given by or influenced by tempHash

    //If all was successfully completed, increment the mapped records counter
    mappedElementCount++;
}



template<class Key,class Value> void Hashmap<Key,Value>::clearMap(){
//TODO: implement a cascading chain of deallocation of stored objects within the 
    //hashmap
//MAYBE-- only if we create new objects rather than just mapping reference 
    //associations,
//which is really the goal here...  In the latter case, just empty the Hashmap 
    //itself
}

      

One possible OOP way to solve this problem is to use Hashmap as the base class and provide derived classes that have well-known Key data types such as the following Stringmap:

String interface:

#ifndef _STRINGMAP_H_
#define _STRINGMAP_H_

#include "Hashmap.h"

template <class Value>
class Stringmap:public Hashmap<std::string,Value>{
private:

public:
//Con/de 'structors
explicit Stringmap();
~Stringmap();

//Here we know our Key will be of type std::string
//so we can generate our hash sig by char values
    //Override hash from the base class
int hash(std::string*);

//override test from base class
void test();


};
#endif /*_STRINGMAP_H_ def*/

      

String schema implementation:

#include "Stringmap.h"

template<class Value> Stringmap<Value>::Stringmap():Hashmap<std::string,Value>(){

}
template<class Value> Stringmap<Value>::~Stringmap(){
printf("\nDestroying the derived stringmap object!\n");
}

template<class Value> void Stringmap<Value>::test(){
printf("The size of our Value is %i\n",sizeof values[0]);
}

template<class Value> int Stringmap<Value>::hash(std::string* str_ptr){

    unsigned int hashval;

    /* we start our hash out at 0 */
    hashval = 0;


    /* for each character, we multiply the old hash by 31 and add the current
     * character.  Remember that shifting a number left is equivalent to
     * multiplying it by 2 raised to the number of places shifted.  So we
     * are in effect multiplying hashval by 32 and then subtracting hashval.
     * Why do we do this?  Because shifting and subtraction are much more
     * efficient operations than multiplication.
     */
    for(int i=0;i<str_ptr->length();i++) {
        hashval = (*(str_ptr))[i] + ((hashval << 5) - hashval);
    }

    /* we then return the hash value mod the hashmap size so that it will
     * fit into the necessary range
     */
    return hashval % (*(Hashmap<std::string,Value>::getSize()));

}

      

So the question is: is it possible to create a hash signature when the data type to be hashed is currently unknown? If so, how? Looking at the std :: hash docs, it seems that the C ++ standard just defines a hash function for each primitive data type, and also for T * (for any type T) ... What's missing, how is this hashing implemented for a given primitive data types and, moreover, how it is implemented for a generic T *. I suppose I could just name the hash (Key) and hope for the best, but it would be nice to understand what's going on behind the scenes.

thanks CCJ

+3


source to share


2 answers


std::unorderd_map

2 receives explicit template (parameter Key

and Value

), and also has a pile template hidden parameters, from which the default hash function std::hash<Key>

.

This STL hash function std::hash<Key>

takes a Key

and returns a std::size_t

. It is already specialized for all integral types and std::string

. From this help site

A hash pattern defines a function object that implements a hash function. Instances of this functional object define an operator (), which:

  • It takes a single parameter of type Key.
  • Returns a value of type size_t that represents the hash value of the parameter.
  • Doesn't throw an exception when called.
  • For two identical parameters k1 and k2, std :: hash () (k1) == std :: hash () (k2).
  • For two different parameters k1 and k2 that are not equal, the probability that std :: hash () (k1) == std :: hash () (k2) should be very small, approaching 1.0 / std :: numeric_limits :: max ().

The hashing pattern is both CopyConstructible and Destructible. unordered associative containers std :: unordered_set, std :: unordered_multiset, std :: unordered_map, std :: unordered_multimap use template specializations std :: hash as default hash function.



The link ends with this quote:

** Actual hash functions are implementation dependent and are not required to meet any quality criteria other than those listed above. **

So, you can look at the implementation of your system, but this does not guarantee anything for the implementation of other systems.

+3


source


There is a template std::hash<T>

that specializes in different types and that you can specialize for your own types.

By default, it std::unordered_map<T>

just delegates the hash to std::hash<T>

(or you can specify a different hash function as a template argument).



Thus, std::unordered_map

you don't need to know anything about the hashing mechanism.

How implemented std::hash

is not specified. However, I find it reasonable to assume that any decent compiler will provide a quality implementation. One of them should keep in mind that it std::hash<char*>

doesn't have a C hash string, it only hashes the pointer value (was there :))

+3


source







All Articles