Filter results by field value of the last array

Having this document structure (omitting irrelevant fields for brevity):

[
    {
        "_id" : 0,
        "partn" : [ 
            {
                "date" : ISODate("2015-07-28T00:59:14.963Z"),
                "is_partner" : true
            }, 
            {
                "date" : ISODate("2015-07-28T01:00:32.771Z"),
                "is_partner" : false
            }, 
            {
                "date" : ISODate("2015-07-28T01:15:29.916Z"),
                "is_partner" : true
            }, 
            {
                "date" : ISODate("2015-08-05T13:48:07.035Z"),
                "is_partner" : false
            }, 
            {
                "date" : ISODate("2015-08-05T13:50:56.482Z"),
                "is_partner" : true
            }
        ]
    },
    {
        "_id" : 149,
        "partn" : [ 
            {
                "date" : ISODate("2015-07-30T12:42:18.894Z"),
                "is_partner" : true
            }, 
            {
                "date" : ISODate("2015-07-31T00:01:51.176Z"),
                "is_partner" : false
            }
        ]
    }
]

      

I need to filter documents where the last (last) partn.is_partner

is true

, is this the best way to do it?

db.somedb
    .aggregate([ 
        // pre-filter only the docs with at least one is_partner === true, is it efficient/needed?
        {$match: {partn: { $elemMatch: { is_partner: true } } } },
        {$unwind: '$partn'},
        // do I need to sort by _id too, here?
        {$sort: {_id: 1, 'partn.date': 1} },
        // then group back fetching the last one by _id
        {$group : {
           _id : '$_id',
           partn: {$last: '$partn'},
        }},
        // and return only those with is_partner === true
        {$match: {'partn.is_partner': true } },
    ])

      

I get what I need, but being a non-expert mongodb developer, there is something like the overhead in this aggregation. I was thinking about just fetching the last entry in each array .partn

, but the assembly sometimes needs to be exported / imported, if I remember correctly, the sort order can be changed, so aggregation and sorting by date might drop this aspect.

Is this the best (most efficient) way to do it? If not, why not?

Thank. (Btw, this is MongoDB 2.6)

+3


source to share


1 answer


Mileage can vary based on this, and it may well be that the "currently" process you are running is at least "most appropriate". But we can probably make it more efficient.

What can you do now

If your arrays are already "ordered" by using the modifier with , you can probably do this: $sort

$push

db.somedb.find(
  { 
    "partn.is_partner": true,
    "$where": function() {
      return this.partn.slice(-1)[0].is_partner == true;
    }
  },
  { "partn": { "$slice": -1 } }
)

      

Thus, while partn,is_partner

"indexing" it is still quite efficient, since this initial query condition can be met using the index. The part that can't be here is using JavaScript evaluation. $where

But what this second part does in $where

is simply "slices" the last element from the array and checks its property value is_partner

to see if it is true. Only if this condition is also met, the document is returned.

There is also an operator . This does the same when returning the last element from the array. False matches are already filtered out, so this just shows only the last item where true. $slice

Combined with the specified index, this should be pretty fast since the documents are already selected and the JavaScript condition is just filtering the rest. Note that without another field with a standard query condition to match, the clause $where

cannot use the index. Therefore, always try to use "sparingly" with other query conditions.

What can you do in the future

Next Up, not yet available at the time of writing, but of course the operator for the aggregation framework will work in the near future . This is currently in the development branch, but here's a look at how it works: $slice

db.somedb.aggregate([
  { "$match": { "partn.is_partner": true } },
  { "$redact": {
    "$cond": {
      "if": { 
        "$anyElementTrue": {
          "$map": {
            "input": { "$slice": ["$partn",-1] },
            "as": "el",
            "in": "$$el.is_partner"
          }
        }
      },
      "then": "$$KEEP",
      "else": "$$PRUNE"
    }
  }},
  { "$project": {
      "partn": { "$slice": [ "$partn",-1 ] }
  }}
])

      

Combining this $slice

in here describes the documents that must be filtered by a logical condition, testing of a document. In this case, creates an array of elements, which is sent to simply retrieve a single value (still as an array). Since this is still another array of elements at best, another test , making this a special boolean result, is fine for . $redact

$slice

$map

is_partner

$anyElementTrue

$cond



This is where $redact

this result decides, whether it is a document $$KEEP

or $$PRUNE

from the results. Later we use it again $slice

in the project to just return the last element of the array after filtering.

This works exactly as the JavaScript version does, except that it uses all native coded operators and therefore should be slightly faster than alternative JavaScript.

Both forms return your first document as expected:

{
    "_id" : 0,
    "partn" : [
            {
                    "date" : ISODate("2015-07-28T00:59:14.963Z"),
                    "is_partner" : true
            },
            {
                    "date" : ISODate("2015-07-28T01:00:32.771Z"),
                    "is_partner" : false
            },
            {
                    "date" : ISODate("2015-07-28T01:15:29.916Z"),
                    "is_partner" : true
            },
            {
                    "date" : ISODate("2015-08-05T13:48:07.035Z"),
                    "is_partner" : false
            },
            {
                    "date" : ISODate("2015-08-05T13:50:56.482Z"),
                    "is_partner" : true
            }
    ]
}

      


The big catch here with both is that your array should already be sorted so that the latest date will be the first. Without that, you would need an aggregation structure for the $sort

array as you do now.

Inefficient, so you must "pre-sort" your array and keep it in order on every update.

As a handy trick, this will actually reorder all array elements across all documents in the collection in one simple statement:

db.somedb.update(
    {},
    { "$push": { 
        "partn": { "$each": [], "$sort": { "date": 1 } }
    }},
    { "multi": true }
)

      

So, even if you don't "push" a new element into the array and just update the property, you can always apply this basic construct to get the array to be ordered the way you want.

It is worth considering as it should make things much faster.

+1


source







All Articles