Mongo aggregation within time intervals

I have some log data stored in a mongo collection that includes basic information as request_id and the time it was added to the collection, for example:

{
    "_id" : ObjectId("55ae6ea558a5d3fe018b4568"),
    "request_id" : "030ac9f1-aa13-41d1-9ced-2966b9a6g5c3",
    "time" : ISODate("2015-07-21T16:00:00.00Z")
}

      

I was wondering if I could use an aggregation framework to aggregate some statistics. I would like to get the counts of objects created during each interval of N minutes in the last X hours.

Thus, the output I need for 10 minute intervals over the last 1 hour should look something like this:

{ "_id" : 0, "time" : ISODate("2015-07-21T15:00:00.00Z"), "count" : 67 }
{ "_id" : 0, "time" : ISODate("2015-07-21T15:10:00.00Z"), "count" : 113 }
{ "_id" : 0, "time" : ISODate("2015-07-21T15:20:00.00Z"), "count" : 40 }
{ "_id" : 0, "time" : ISODate("2015-07-21T15:30:00.00Z"), "count" : 10 }
{ "_id" : 0, "time" : ISODate("2015-07-21T15:40:00.00Z"), "count" : 32 }
{ "_id" : 0, "time" : ISODate("2015-07-21T15:50:00.00Z"), "count" : 34 }

      

I would use this to get data for graphs.

Any advice is appreciated!

+3


source to share


3 answers


Something like that?

pipeline = [
    {"$project":
        {"date": {
            "year": {"$year": "$time"},
            "month": {"$month": "$time"},
            "day": {"$dayOfMonth": "$time"},
            "hour": {"$hour": "$time"},
            "minute": {"$subtract": [
                {"$minute": "$time"},
                {"$mod": [{"$minute": "$time"}, 10]}
            ]}
        }}
    },
    {"$group": {"_id": "$date", "count": {"$sum": 1}}}
]

      

Example:



> db.foo.insert({"time": new Date(2015,  7, 21, 22, 21)})
> db.foo.insert({"time": new Date(2015,  7, 21, 22, 23)})
> db.foo.insert({"time": new Date(2015,  7, 21, 22, 45)})
> db.foo.insert({"time": new Date(2015,  7, 21, 22, 33)})
> db.foo.aggregate(pipeline)

      

and the output is:

{ "_id" : { "year" : 2015, "month" : 8, "day" : 21, "hour" : 20, "minute" : 40 }, "count" : 1 }
{ "_id" : { "year" : 2015, "month" : 8, "day" : 21, "hour" : 20, "minute" : 20 }, "count" : 2 }
{ "_id" : { "year" : 2015, "month" : 8, "day" : 21, "hour" : 20, "minute" : 30 }, "count" : 1 }

      

+1


source


There are several ways to approach this, depending on which output format best suits your needs. The main note is that with an "aggregation structure" , you cannot actually return something "cast" as a date, but you can get values ​​that are easily restored to an object Date

when processing the results in your API.

The first approach is to use the date aggregation operators available for the aggregation framework:

db.collection.aggregate([
    { "$match": {
        "time": { "$gte": startDate, "$lt": endDate }
    }},
    { "$group": {
        "_id": {
            "year": { "$year": "$time" },
            "dayOfYear": { "$dayOfYear": "$time" },
            "hour": { "$hour": "$time" },
            "minute": {
                "$subtract": [
                    { "$minute": "$time" },
                    { "$mod": [ { "$minute": "$time" }, 10 ] }
                ]
            }
        },
        "count": { "$sum": 1 }
    }}
])

      

Which returns a composite key for _id

, containing all the values ​​you want to use for "date". Alternatively, if only for an "hour", always just use the "minute" part and work out the actual date based on startDate

your choice of range.

Or, you can simply use "date math" to get the milliseconds since "epoch", which can be sent again directly to the date constructor.

db.collection.aggregate([
    { "$match": {
        "time": { "$gte": startDate, "$lt": endDate }
    }},
    { "$group": {
        "_id": {
            "$subtract": [
               { "$subtract": [ "$time", new Date(0) ] },
               { "$mod": [
                   { "$subtract": [ "$time", new Date(0) ] },
                   1000 * 60 * 10
               ]}
            ]
        },
        "count": { "$sum": 1 }
    }}
])

      

In all cases, you do not like it do not use before use . As a "pipeline step" should "loop", although all selected documents and "transform" the content. $project

$group

$project



This takes time and adds to the bottom line of the query. You can just simply apply to $group

directly as shown.

Or, if you are really "clean" about an object Date

that is returned without post-processing, you can always use "mapReduce" , since JavaScript functions actually allow processing as a date, but slower than the aggregation structure and of course no cursor response:

db.collection.mapReduce(
   function() {
       var date = new Date(
           this.time.valueOf() 
           - ( this.time.valueOf() % ( 1000 * 60 * 10 ) )
       );
       emit(date,1);
   },
   function(key,values) {
       return Array.sum(values);
   },
   { "out": { "inline": 1 } }
)

      

Your best bet is to use aggregation as the transformation of the response is quite simple:

db.collection.aggregate([
    { "$match": {
        "time": { "$gte": startDate, "$lt": endDate }
    }},
    { "$group": {
        "_id": {
            "year": { "$year": "$time" },
            "dayOfYear": { "$dayOfYear": "$time" },
            "hour": { "$hour": "$time" },
            "minute": {
                "$subtract": [
                    { "$minute": "$time" },
                    { "$mod": [ { "$minute": "$time" }, 10 ] }
                ]
            }
        },
        "count": { "$sum": 1 }
    }}
]).forEach(function(doc) {
    doc._id = new Date(doc._id);
    printjson(doc);
})

      

And then you have an interval grouping output with real objects Date

.

+1


source


a pointer instead of a specific answer. you can easily do this in minutes, hours and given periods using date aggregations . every 10 minutes will be a little more difficult, but probably with some wrangling. however, aggregation will be slow as nuts on large datasets.

I would suggest extracting minutes after inserting

{
    "_id" : ObjectId("55ae6ea558a5d3fe018b4568"),
    "request_id" : "030ac9f1-aa13-41d1-9ced-2966b9a6g5c3",
    "time" : ISODate("2015-07-21T16:00:00.00Z"),
    "minutes": 16
}

      

and even though it sounds downright absurd, adding quartiles and sextiles or whatever may be N.

{
    "_id" : ObjectId("55ae6ea558a5d3fe018b4568"),
    "request_id" : "030ac9f1-aa13-41d1-9ced-2966b9a6g5c3",
    "time" : ISODate("2015-07-21T16:00:00.00Z"),
    "minutes": 16,
    "quartile: 1,
    "sextile: 2,
}

      

first try making $ div in minutes. does not ceil and gender. but check

Is there a gender function in Mongodb's aggregation framework?

0


source







All Articles