Use Cases for Aerospike Digest

I am currently developing an Aerospike Cluster which handles many relationships and is developing very quickly. I found many references in aerospace documentation to the digest that is generated when retrieving a key using a python client, but none of them show its out-of-memory usefulness.

From the documentation :: Collection is the hash of the key. Keys are hashed using the RIPEMD-160 algorithm, which will take any key length and will always return a 20 byte digest. If you have a long key, say 200 bytes, getting a digest for that key will allow you to improve wire performance by saving 180 bytes.

My question is, does the digest increase the search time? and is it worth storing the digest in other kits to create relationships?

+3


source to share


2 answers


digest

is not generated when the key is retrieved, but is calculated each time it is run key

in the client, and this digest will be used to communicate with the cluster and find an entry. By default, even the actual key is not saved with the recording data. Thus, internally, all searches are performed using digests.

From the documentation :



In the application, each record will have a key associated with it. This key is what the application will use to read or write writes.

However, when the key is sent to the database, the key (along with the specified information) is hashed into a 160-bit digest. Within a database, the digest uses the address of the record for all operations.

The key is used mainly in the application, while the digest is mainly used to address a record in the database.

You will not need to use digests directly. When you create the relationship, you will also create a secondary index for performance and this will work based on hashes anyway, so it doesn't matter if you use a digest instead of a key.
You can also try to model relationships as complex or large data types within a single record.

+2


source


Digest is also used to search for a record in an aerospace cluster. The first 12 bits are used to denote the partition ID and use the partition ID, the partition table maps the Primary and Replica nodes in the cluster. Thus, in essence, a digest is the key to a quick search.

In addition, the digest is calculated using the RIPEMD160 algorithm, which has a very low collision rate, resulting in an even distribution of data between nodes.



The only issue with a digest is that it is also a primary index (digest + metadata) and the indexes are always kept in memory, which limits the number of records that can be stored in the cluster.

+2


source







All Articles