Slice array based on match conditions with $ lookup

I am discovering Mongodb and its pipelined queries and I am struggling to get the job done.

I search each pathsList document where I can find the path from position B to position C

  • Inputs: 10 and 12
  • Output: [pathsList] (I expect this example to only have 1 result, but rather an array later)

Let's say I have 2 following pathsList documents from pathsListsCollection, they have an array of documents path

-------------
pathsList = {
    _id: ObjectId(...),
    pathIds: [path1Id, path2Id, path3Id]
}
-------------
path1 = {
    _id: ObjectId(...),
    positionStart: 8,
    positionFinal: 10,
    index:0
}
-------------
path2 = {
    _id: ObjectId(...),
    positionStart: 10,
    positionFinal: 12,
    index:1
}
-------------
path3 = {
    _id: ObjectId(...),
    positionStart: 12,
    positionFinal: 14,
    index:2
}
-------------

-------------
pathsList = {
    _id: ObjectId(...),
    pathIds: [path4Id, path5Id, path6Id]
}
-------------
path4 = {
    _id: ObjectId(...),
    positionStart: 14,
    positionFinal: 12,
    index:0
}
-------------
path5 = {
    _id: ObjectId(...),
    positionStart: 12,
    positionFinal: 10,
    index:1
}
-------------
path6 = {
    _id: ObjectId(...),
    positionStart: 10,
    positionFinal: 8,
    index:2
}
-------------

      

So far I have done something like this:

pathsListCollection.aggregate([
    {
    $lookup:{
        from: "pathsCollection",
        localField: "pathIds",
        foreignField: "_id",
        as: paths
        }
    },
    {
        $match:{
            paths.positionStart : 10 // first input value
        }
    },
    {
        $match:{
            paths.positionFinal : 12  // second input value
        }
    },
])

      

By doing this, I get two pathsList files.

Now, how can I change this aggregate to only find the one that has positionStart = 10 and positionFinal = 12 in that particular order How can I try to validate the second expression only if the first one is already validated?

I looked around $ slice trying to slice the path array after the first match of $ and continue with the rest of the query, but couldn't find revelent syntax to do so.

ie: working with the first pathsList, I reach the first $ match with the paths array like this:

[{
    _id: ObjectId(...),
    positionStart: 8,
    positionFinal: 10,
    index:0
},
{
    _id: ObjectId(...),
    positionStart: 10,     // first $match here
    positionFinal: 12,
    index:1
},
{
    _id: ObjectId(...),
    positionStart: 12,
    positionFinal: 14,
    index:2
}]

      

and I want to do the following matching $ in an array like this:

[{
    _id: ObjectId(...),
    positionStart: 10,
    positionFinal: 12,    // second $match has to start from this elem of the array
    index:1
},
{
    _id: ObjectId(...),
    positionStart: 12,
    positionFinal: 14,
    index:2
}]

      

Can this be done? would it be easier to ask for pathCollection paths rather than pathsListsCollection in the first place?

However, I am open to any other approach or at least that could help me solve this problem Thanks in advance

+3


source to share


1 answer


If I get it, if you want to "cut" on the basis of compliance positionStart

and positionFinish

here.

Optimal case

Actually it would be better for performance if you actually started the query in "pathsCollection"

instead, because that is where you are going to match items. So it $match

should be done "first" and using $or

to "slice" the entire range:

Then with $lookup

you should only return the elements "pathsListCollection"

formatted with the "sliced" array:

In a minimal sample of your data, the transition from "start" 10

to "finish" 12

will look like this:

db.getCollection('pathsCollection').aggregate([
  { "$match": {
    "positionStart": { "$gte": 10, "$lte": 12 },
    "positionFinal": { "$gte": 10, "$lte": 12 }
  }},
  { "$lookup": {
    "from": "pathsListCollection",
    "localField": "_id",
    "foreignField": "pathIds",
    "as": "pathsList"
  }},
  { "$unwind": "$pathsList" },
  { "$addFields": {
    "pathsList.pathIds": {
      "$filter": {
        "input": "$pathsList.pathIds",
        "as": "p",
        "cond": { "$eq": [ "$_id", "$$p" ] }
      }
    }
  }},
  { "$unwind": "$pathsList.pathIds" },
  { "$group": {
    "_id": "$pathsList._id",
    "pathIds": {
      "$push": {
        "_id": "$_id",
        "positionStart": "$positionStart",
        "positionFinal": "$positionFinal",
        "index": "$index"  
      }  
    }  
  }},
  { "$redact": {
    "$cond": {
      "if": {
        "$and": [
          { "$eq": [
            { "$arrayElemAt": [
              "$pathIds.positionStart",
              0 
            ]},
            10
          ]},
          { "$eq": [
            { "$arrayElemAt": [
              "$pathIds.positionFinal",
              -1
            ]},
            12
          ]}  
        ]
      },
      "then": "$$KEEP",
      "else": "$$PRUNE"  
    }  
  }}
])

      

Produces:

/* 1 */
{
    "_id" : ObjectId("595db5d8f5f11516540d1185"),
    "pathIds" : [ 
        {
            "_id" : ObjectId("595db4c7f5f11516540d1183"),
            "positionStart" : 10.0,
            "positionFinal" : 12.0,
            "index" : 1.0
        }
    ]
}

      

And with decreasing "range" to "start" 10

and "finish" 14

it will be:

db.getCollection('pathsCollection').aggregate([
  { "$match": {
    "positionStart": { "$gte": 10, "$lte": 14 },
    "positionFinal": { "$gte": 10, "$lte": 14 }
  }},
  { "$lookup": {
    "from": "pathsListCollection",
    "localField": "_id",
    "foreignField": "pathIds",
    "as": "pathsList"
  }},
  { "$unwind": "$pathsList" },
  { "$addFields": {
    "pathsList.pathIds": {
      "$filter": {
        "input": "$pathsList.pathIds",
        "as": "p",
        "cond": { "$eq": [ "$_id", "$$p" ] }
      }
    }
  }},
  { "$unwind": "$pathsList.pathIds" },
  { "$group": {
    "_id": "$pathsList._id",
    "pathIds": {
      "$push": {
        "_id": "$_id",
        "positionStart": "$positionStart",
        "positionFinal": "$positionFinal",
        "index": "$index"  
      }  
    }  
  }},
  { "$redact": {
    "$cond": {
      "if": {
        "$and": [
          { "$eq": [
            { "$arrayElemAt": [
              "$pathIds.positionStart",
              0 
            ]},
            10
          ]},
          { "$eq": [
            { "$arrayElemAt": [
              "$pathIds.positionFinal",
              -1
            ]},
            14
          ]}  
        ]
      },
      "then": "$$KEEP",
      "else": "$$PRUNE"  
    }  
  }}
])

      

Production:

/* 1 */
{
    "_id" : ObjectId("595db5d8f5f11516540d1185"),
    "pathIds" : [ 
        {
            "_id" : ObjectId("595db4c7f5f11516540d1183"),
            "positionStart" : 10.0,
            "positionFinal" : 12.0,
            "index" : 1.0
        }, 
        {
            "_id" : ObjectId("595db4c7f5f11516540d1184"),
            "positionStart" : 12.0,
            "positionFinal" : 14.0,
            "index" : 2.0
        }
    ]
}

      


Reverse case



It might look a little shorter in syntax, but it is probably not the most efficient option, given that you cannot "query" "pathsCollection"

otherwise before $lookup

:

db.pathsListCollection.aggregate([
  { "$lookup": {
    "from": "pathsCollection",
    "localField": "pathIds",
    "foreignField": "_id",
    "as": "pathIds"    
  }},
  { "$unwind": "$pathIds" },
  { "$match": {
    "pathIds.positionStart": { "$gte": 10, "$lte": 14 },
    "pathIds.positionFinal": { "$gte": 10, "$lte": 14 }
  }},
  { "$group": {
    "_id": "$_id",
    "pathIds": { "$push": "$pathIds" }    
  }},
  { "$redact": {
    "$cond": {
      "if": {
        "$and": [
          { "$eq": [
            { "$arrayElemAt": [
              "$pathIds.positionStart",
              0 
            ]},
            10
          ]},
          { "$eq": [
            { "$arrayElemAt": [
              "$pathIds.positionFinal",
              -1
            ]},
            14
          ]}  
        ]
      },
      "then": "$$KEEP",
      "else": "$$PRUNE"  
    }  
  }}
])

      

This is the most optimal form of the "reverse case" given in relation to how MongoDB actually produces $lookup

when applied to the server, which can be seen in the "explain" output:

    {
        "$lookup" : {
            "from" : "pathsCollection",
            "as" : "pathIds",
            "localField" : "pathIds",
            "foreignField" : "_id",
            "unwinding" : {
                "preserveNullAndEmptyArrays" : false
            },
            "matching" : {
                "$and" : [ 
                    {
                        "positionStart" : {
                            "$gte" : 10.0
                        }
                    }, 
                    {
                        "positionStart" : {
                            "$lte" : 14.0
                        }
                    }, 
                    {
                        "positionFinal" : {
                            "$gte" : 10.0
                        }
                    }, 
                    {
                        "positionFinal" : {
                            "$lte" : 14.0
                        }
                    }
                ]
            }
        }
    }, 
    {
        "$group" : {

      

Showing that $unwind

and $match

"magically" disappeared. Of course, they have now been "collapsed" into $lookup

, so when the related data is requested, you actually get only those results that match the condition.

The "suboptimal" approach will be instead $filter

. But actually ALL results from the associated collection are returned and then only removed after the "full" array has already been presented:

db.pathsListCollection.aggregate([
  { "$lookup": {
    "from": "pathsCollection",
    "localField": "pathIds",
    "foreignField": "_id",
    "as": "pathIds"    
  }},
  { "$addFields": {
    "pathIds": {
      "$filter": {
        "input": "$pathIds",
        "as": "p",
        "cond": {
          "$and": [
            { "$gte": [ "$$p.positionStart", 10 ] },
            { "$lte": [ "$$p.positionStart", 14 ] },            
            { "$gte": [ "$$p.positionFinal", 10 ] },
            { "$lte": [ "$$p.positionFinal", 14 ] },            
          ]
        }
      } 
    }  
  }},
  { "$match": {
    "pathIds": {
      "$elemMatch": {
        "positionStart": { "$gte": 10, "$lte": 14 },
        "positionFinal": { "$gte": 10, "$lte": 14 }
      }
    }
  }},
  { "$redact": {
    "$cond": {
      "if": {
        "$and": [
          { "$eq": [
            { "$arrayElemAt": [
              "$pathIds.positionStart",
              0 
            ]},
            10
          ]},
          { "$eq": [
            { "$arrayElemAt": [
              "$pathIds.positionFinal",
              -1
            ]},
            14
          ]}  
        ]
      },
      "then": "$$KEEP",
      "else": "$$PRUNE"  
    }  
  }}
])

      

Also noting that you still need $match

or $redact

given that the resulting array entries still satisfy the conditions in this case and that the array was not actually "empty" in the result . $filter


Samples used

pathsCollection

/* 1 */
{
    "_id" : ObjectId("595db4c7f5f11516540d1182"),
    "positionStart" : 8.0,
    "positionFinal" : 10.0,
    "index" : 0.0
}

/* 2 */
{
    "_id" : ObjectId("595db4c7f5f11516540d1183"),
    "positionStart" : 10.0,
    "positionFinal" : 12.0,
    "index" : 1.0
}

/* 3 */
{
    "_id" : ObjectId("595db4c7f5f11516540d1184"),
    "positionStart" : 12.0,
    "positionFinal" : 14.0,
    "index" : 2.0
}

/* 4 */
{
    "_id" : ObjectId("595db616f5f11516540d1186"),
    "positionStart" : 14.0,
    "positionFinal" : 12.0,
    "index" : 0.0
}

/* 5 */
{
    "_id" : ObjectId("595db616f5f11516540d1187"),
    "positionStart" : 12.0,
    "positionFinal" : 10.0,
    "index" : 1.0
}

/* 6 */
{
    "_id" : ObjectId("595db616f5f11516540d1188"),
    "positionStart" : 10.0,
    "positionFinal" : 8.0,
    "index" : 2.0
}

      

pathsListCollection

/* 1 */
{
    "_id" : ObjectId("595db5d8f5f11516540d1185"),
    "pathIds" : [ 
        ObjectId("595db4c7f5f11516540d1182"), 
        ObjectId("595db4c7f5f11516540d1183"), 
        ObjectId("595db4c7f5f11516540d1184")
    ]
}

/* 2 */
{
    "_id" : ObjectId("595db62df5f11516540d1189"),
    "pathIds" : [ 
        ObjectId("595db616f5f11516540d1186"), 
        ObjectId("595db616f5f11516540d1187"), 
        ObjectId("595db616f5f11516540d1188")
    ]
}

      

+1


source







All Articles