Homogeneous and heterogeneous in documentdb

I am using Azure DocumentDB and my entire experience with NoSql has been in MongoDb. I looked at the pricing model and cost per collection. In MongoDb, I would create 3 collections for what I used: Users, Companies and Emails. I noted that this approach would cost $ 24 per collection per month.

I have been told by the people I work with that I am doing it wrong. I must have all three of these things stored in one collection, with a field describing the data type. Each collection must be related by date or geographic area, so one part of the world has a smaller part to search. and:

"Combine different types of documents into one collection and add a field for all to separate them into search as a type field or something"

I would never have dreamed of doing this in Mongo as it would do indexing, shard keys and more to get right.

There might not be fields that overlap between objects (example: email and company objects)

I can do it this way, but I can’t find a single example of anyone else doing it this way, indicating that it might be wrong. Now I don't need an example, but can someone point me to some place that describes that this is the correct way to do this? Or, if you are creating a single collection for all data other than the Azure pricing model, what are the advantages / disadvantages of doing this?

Any good articles on DocumentDb schema design?

+3


source to share


1 answer


Yes. To use CosmosDb to its full potential, you need to think about the collection as a whole database system, not as a "table" designed to hold just one type of object.

A shard in space is extremely simple. You simply specify the field that all your documents will be filled in and select that as the section key. If you just select a general value such as key

or partitionKey

, you can easily separate the inbox storage from users from anything else by selecting the appropriate values.

class InboundEmail
{
   public string Key {get; set;} = "EmailsPartition";
   // other properties
}

class User
{
   public string Key {get; set;} = "UsersPartition";
   // other properties
}

      

However, I show another example only. In reality, your partition key values ​​should be even more dynamic. It is important to understand that queries against a known section are extremely fast. Once you have to scan multiple partitions, you will see much slower and more expensive results.



So, in an application that swallows a lot of user data. Keeping a single user activity in a single section might make sense for that particular object.

If you need proof that this is a suitable way to use CosmosDb, consider adding new Gremlin Graph APIs. Graphs are inherently heterogeneous as they contain many different entities and entity types, as well as the relationships between them. The Cosmos request boundary is at the collection level, so if you tried to put your entities in different collections, none of the Graph APIs will work.

EDIT: I noticed in the comments that you made this statement And you would have an index on every field in both objects

. CosmosDb does automatically indexes each field of each document. They use a special proprietary path based indexing mechanism that ensures that every path in your JSON tree has indexes on it. You must specifically choose out of this automatic indexing function.

+4


source







All Articles