Best algorithm for key / value pair where key is int64 in Delphi, pre Delphi 2009?

I need an algorithm for storing a key / value pair where the key is Int64. I am currently using a sorted IntList (same as TStringList but keeps int64). This gives me O (log n) for search, insert and delete operations. Since I don't need sorted items, this is a bit inefficient. I need some sort of hash table for O (1) operations. The problem is that most of the implementations I can find assume the key is a string. Now I could obviously convert the Int64 key to a string, but that seems wasteful. Any ideas?

I don't know the number of elements before they are put into the data structure.

I should also add that I have implemented the same component in .net using a dictionary and added elements that are much faster in the .net version. When the data structure is set up, traversal and search aren't that bad in comparison, but it's the insert that is killing me.

+1


source to share


4 answers


Delphi 2009 and later added Generics.

So, starting in Delphi 2009, you can implement your key / value pair just like in .NET using TDICTIONARY.



And TDICTIONARY in Delphi uses a hash table and has O (1) operations.

+3


source


You can create a hash table where the hash value is a simple Int64 unit that you add to the hash.

Any good hash table implementation will generate the hash index (by hashing the key) separately from the rest of the logic.



Some of the implementations are: Delphi 5 Hashtable Implementation

+2


source


You can compute the hash value directly from the int64 value, but for that you need to find a hash function that evenly distributes the different int64 values ​​so that you don't get any collisions. This, of course, depends on the values ​​of these keys. If you don't know the number of elements, you most likely also don't know how these int64 values ​​are distributed, so coming up with a good hash function will be difficult.

Assuming your keys are not multiples (for example, addresses that will be multiples of 4, 8, 16, etc.), you can speed things up a bit by using a list of several of these IntList objects and compute the index into that array of lists first. Using the mod operator and a prime number would be an easy way to calculate the index of the list. As always, this is a trade-off between speed and memory consumption.

You can also google for a good implementation of sparse arrays. Julian Bucknall's IIRC EZDSL library has one.

+2


source


Some thoughts, not a full blown solution.

If there is definite proof that the search itself is a bottleneck (don't use your "feel" to find bottlenecks, use a code profiler), I would stick with IntList ... If the time spent in a real search / insert / delete is not less than 20% of the total cpu time, don't even bother.

If you still want a hash table then ...

Don't convert to string. The conversion will allocate a new string from the heap, which is much more expensive than the search itself. Use int64 modulo some cleverly chosen prime as the hash key.

Hashtables will only give you O (1) if they are big enough. Otherwise, you will end up with a large number of records that have the same key. Make it too short, you'll be wasting time searching (linearly!) Through a linked list. Make it too big and you lose memory.

Be aware that hash tables require some form of linked list for all records to use the same key. This linked list must be implemented either by adding a "next" pointer to the payload objects (which breaks the encapsulation - the object does not need to know that it is stored in a hash table), or by allocating a small helper object. This allocation is likely to be much more expensive than O (log) a sorted list.

+1


source







All Articles