Hash Table - Binary Search Tree Implementation
From Cracking the Coding Interview, p. 71:
Alternatively, we can implement a hash table with BST. Then we can guarantee the search time O (log n), since we can keep the tree balanced. In addition, we can use less space, since a large array does not need to be allocated longer at the very beginning.
I know the basics of linked lists, hash tables and BSTs, but I can't figure out these lines. What does it mean? Will this final data structure be Trie?
source to share
The full text of this section says that the last paragraph is the one you asked about:
A hash table is a data structure that maps keys to values ββfor highly efficient lookups. In a very simple implementation of a hash table, the hash table has an underlying array and hash function. When you want to insert an object and its key, the hash function maps the key to an integer that indicates the index in the array. The object is then saved to that index.
This will generally not work correctly. In the above implementation, the hash value of all possible keys must be unique, or we might accidentally overwrite the data. the array must be extremely large - the size of all possible keys - to prevent such "collisions".
Instead of creating an extremely large array and storing the objects in the index hash (key), we can make the array much smaller and store the objects in the linked list in the index hash (key)% array_length. To get an object with a specific key, we have to search for the associated list for this key.
Alternatively, we can implement a hash table with a binary search tree. Then we can guarantee the search time (log n), since we can keep the tree balanced. Additionally, we can use less space since the large array no longer needs to be allocated at the beginning.
So they talk about using BST (Binary Search Tree) to handle conflicts. In fact, it would be pointless to use BST as the only data structure, since the whole point of a properly configured hash is that the look-up has an order O(1)
much better than O(log n)
from the BST. Also, using BST to fully implement a hash table means it is not a hash table :-)
Note, however, that when colliding in a hash table, frequent calls to them is to have each bucket contain a linked list of its elements. In the degenerate case (all hash items are in the same bucket), you only get a linked list, but O(1)
turns into O(n)
.
So instead of a linked list in each bucket, you have a BST. Then you no longer have the difficulty of finding O(n)
in cases where there are many elements in one bucket (previously mentioned collisions).
You use a hash function to find the bucket in O(1)
, then search for the BST in O(log n)
if there is a collision. At best (one item per bucket) it still is O(1)
. The worst case becomes O(log n)
, not O(n)
.
The only thing that initially touched me about this theory is that they also discuss the fact that a large selection is no longer required. If it is a generic hash / BST combination, you still need to highlight the entire hash table to make it appear irrelevant.
However, from context ("... since the array is large no longer need to be allocated ..."), it looks like they mean that they can make the hash table part of the double data structure smaller, since collisions are more efficient to handle. In other words, instead of a 1000 item hash table with linked lists for collisions, you can get rid of a 100 item hash table because the collision is not that bad for lookup times if you are using BST.
source to share
You combine several terms here.
-
The idea would be to implement a hashtable with both array and BST in a two-story way. It would be possible to add values ββto the hash if there was no collision, but if that were the case, then the problem of getting the colliding element with the BST could be solved.
-
A trie is different; depending on what you tried to store, you won't be able to apply it to the hash function.
source to share
a O (logN) bound would be the worst case scenario in the case of a tree. Let's look at it this way. We insert 45, 33, 55,66,22 and then we will have 45 as the root node, 33 and 55 at level1,22 and 66 at level 2.
So, if you were the hash for the value 45, it would still be an O (1) operation ... Only when you search for nodes in level2 will it be close to O (logN) .... The tree could be the RB tree tree / AVL so that it doesn't degenerate into a linked list ... You lose efficiency, but make up for it in space efficiency.
Another advantage is that you don't have to worry about collisions and then the hash table. http://www.cs.rit.edu/~ib/Classes/CS233_Spring08-09/Slides/Week9_Hashing.pdf
Basically you will have dynamic node allocation and no empty space in the hash table will be wasted on unused buckets ... Let's say you had to use a static hash table with a predefined size (inserts) then it leads to an inefficient implementation space.
source to share