DocumentDb Document GUID Pointer Precision

Let's say we have a unique GUID / UUID value in our documents:

[
  {
    "id": "123456",
    "Key": "117dfd49-a71d-413b-a9b1-841e88db06e8"
    "Name": "Kaapstad",
  },
  ...
]

      

We only want to query this through equality. The range or order of queries is not required. For example:

SELECT * FROM c where c.Key = "117dfd49-a71d-413b-a9b1-841e88db06e8"

      

Below is the definition of an index. It is a hash index (since no range query will be done) using the datatype String

(since Javascript doesn't support Guid natively)

collection.IndexingPolicy.IncludedPaths.Add(
    new IncludedPath { 
        Path = "/Key/?", 
        Indexes = new Collection<Index> { 
            new HashIndex(DataType.String) { Precision = -1 }
        }
    });

      

But what's the best indexing precision for this?

This MSDN page does not give me a clue as to which precision value is most appropriate for such a value:

Index precision configuration is more useful when using row ranges. Because strings can be arbitrary lengths, the choice of index precision can affect the performance of queries in string ranges and affect the amount of space required to store the index. Row range indices can be configured from 1-100 or -1 ("maximum"). If you want to fulfill the Order. For string property queries, you must specify a precision of -1 for the respective paths.

+2


source to share


1 answer


You can fine tune the indexing precision value based on the number of documents that you expect to contain the path for your property key (which turns out to be the property Key

in your example).

The indexing precision for the hash index indicates the number of bytes for the hash value of the property. Thus, lowering the precision value helps to optimize the amount of storage required to store the index. Increasing the precision value (in the context of a hash index) helps protect against hash collisions over the index.

For example, let's say the hash index precision is 3 on the path foo

.



3 bytes = 3 * 8 = 24 bits.

24 bits can support: 2 ^ 24 = 16,777,216 values

Basically pigeonhole you are guaranteed to have a hash collision when storing> 16,777,216 documents with a property foo

. When the hashes collide, DocumentDB must then validate a subset of the found documents. For example, if you have 30,000,000 documents with a property foo

, you can expect to scan an average of 2 documents.

+7


source







All Articles