Mongodb: create a top-level index for a nested document instead of indexing every single sub-level?

This question is about how I can use indexes in MongoDB to look at something in nested documents without indexing every single sub-level. I have a compilation "test" in MongoDB that basically goes something like this:

{
"_id" : ObjectId("50fdd7d71d41c82875a5b6c1"),
"othercol" : "bladiebla",
"scenario" : {
        "1" : { [1,2,3] },
        "2" : { [4,5,6] }
}}

      

A script has multiple keys, each document can have any subset of scripts (i.e. one to a subset for all). Also: the script cannot be an array because I need it as a dictionary in Python. I have created an index on the script field.
My problem is that I want to select in a collection, filtering for documents that have a specific value. So this works perfectly functional:

db.test.find({"scenario.1": {$exists: true}})

      

However, it won't use any index that I included in the script. Only if I put the index on "script1" the index is used. But I may have thousands (or more) of scripts (and the collection itself contains 100,000 entries), so I would rather not do that! So I tried alternatives:

db.test.find({"scenario": "1"}) 

      

This will use the index in the script, but will not return results. When scripted, the array still gives the same index problem.

Is my question clear? Can anyone please provide a pointer to how I could achieve the best performance here?

Ps I've seen this: How to create a nested index in MongoDB? but this solution is not possible in my case (due to the number of scripts)

+3


source to share


2 answers


Johnny HC's answer is a good explained answer and should be used in general cases. I just suggest a workaround to solve your problem if you have to have a lot of scripts and don't need a complex query. Instead of storing the values ​​in a script field, just keep the script ID in that field and hold the values ​​as another field in the document and use the script ID as the key of that field.

Example:



{
"_id" : ObjectId("50fdd7d71d41c82875a5b6c1"),
"othercol" : "bladiebla",
"scenario" : [ "1", "2"],
"scenario_1": [1,2,3],
"scenario_2": [4,5,6]
}}

      

With this schema, you can use the script index to find specific scripts. But if you need to query for specific script values, you will again need an index for each script value field, i.e. script_1, script_2, etc. If you need to have indexes on each field, then don't change the original schema and use sparse indexes on each nested field, which can help reduce the size of your indexes.

+2


source


Placing an index on a subobject , for example scenario

, is useless in this case, as it will only be used when filtering on completion of scenario

objects, not individual fields (think of this as comparing a binary blob).

You need to add an index for each of the possible fields ( "scenario.1"

, "sceanario.2"

etc.), or rework your schema to get rid of dynamic keys by doing something like this:

{
"_id" : ObjectId("50fdd7d71d41c82875a5b6c1"),
"othercol" : "bladiebla",
"scenario" : [
    { id: "1", value: [1,2,3] },
    { id: "2", value: [4,5,6] }
}}

      



You can then add one index to scenario.id

to support the queries you need to execute.

I know you said you need to scenario

be a dict and not an array, but I don't see how you have a choice.

+4


source







All Articles