Aggregate group with multiple fields

Given the following dataset:

{ "_id" : 1, "city" : "Yuma", "cat": "roads", "Q1" : 0, "Q2" : 25, "Q3" : 0, "Q4" : 0 }
{ "_id" : 2, "city" : "Reno", "cat": "roads", "Q1" : 30, "Q2" : 0, "Q3" : 0, "Q4" : 60 }
{ "_id" : 3, "city" : "Yuma", "cat": "parks", "Q1" : 0, "Q2" : 0, "Q3" : 45, "Q4" : 0 }
{ "_id" : 4, "city" : "Reno", "cat": "parks", "Q1" : 35, "Q2" : 0, "Q3" : 0, "Q4" : 0 }
{ "_id" : 5, "city" : "Yuma", "cat": "roads", "Q1" : 0, "Q2" : 15, "Q3" : 0, "Q4" : 20 }

      

I am trying to achieve the following result. It would be great to just return the totals greater than zero, and also compress each city, cat, and Qx in just one entry.

{
    "city" : "Yuma",
    "cat" : "roads",
    "Q2total" : 40
}, 
{
    "city" : "Reno",
    "cat" : "roads",
    "Q1total" : 30
},
{
    "city" : "Reno",
    "cat" : "roads",
    "Q4total" : 60
},
{
    "city" : "Yuma",
    "cat" : "parks",
    "Q3total" : 45
},
{
    "city" : "Reno",
    "cat" : "parks",
    "Q1total" : 35
},
{
    "city" : "Yuma",
    "cat" : "roads",
    "Q4total" : 20
}

      

Possible?

+3


source to share


1 answer


We might ask for what purpose? There is already a good consistent object structure in your docs, which is recommended. Having objects with different keys is a great idea. Data is "data" and doesn't really have to be key names.

With this in mind, the aggregation framework actually follows this meaning and does not allow generating arbitrary key names from the data contained in the document. But you can get a similar result with the output as data points:

db.junk.aggregate([
    // Aggregate first to reduce the pipeline documents somewhat
    { "$group": {
        "_id": {
            "city": "$city",
            "cat": "$cat"
        },
        "Q1": { "$sum": "$Q1" },
        "Q2": { "$sum": "$Q2" },
        "Q3": { "$sum": "$Q3" },
        "Q4": { "$sum": "$Q4" }
    }},

    // Convert the "quarter" elements to array entries with the same keys
    { "$project": {
        "totals": {
            "$map": {
                "input": { "$literal": [ "Q1", "Q2", "Q3", "Q4" ] },
                "as": "el",
                "in": { "$cond": [
                    { "$eq": [ "$$el", "Q1" ] },
                    { "quarter": "$$el", "total": "$Q1" },
                    { "$cond": [
                        { "$eq": [ "$$el", "Q2" ] },
                        { "quarter": "$$el", "total": "$Q2" },
                        { "$cond": [
                           { "$eq": [ "$$el", "Q3" ] },
                           { "quarter": "$$el", "total": "$Q3" },
                           { "quarter": "$$el", "total": "$Q4" }
                        ]}
                    ]}
                ]}
            }
        }
    }},

    // Unwind the array produced
    { "$unwind": "$totals" },

    // Filter any "0" resutls
    { "$match": { "totals.total": { "$ne": 0 } } },

    // Maybe project a prettier "flatter" output
    { "$project": {
        "_id": 0,
        "city": "$_id.city",
        "cat": "$_id.cat",
        "quarter": "$totals.quarter",
        "total": "$totals.total"
    }}
])

      

This gives you the following results:

{ "city" : "Reno", "cat" : "parks", "quarter" : "Q1", "total" : 35 }
{ "city" : "Yuma", "cat" : "parks", "quarter" : "Q3", "total" : 45 }
{ "city" : "Reno", "cat" : "roads", "quarter" : "Q1", "total" : 30 }
{ "city" : "Reno", "cat" : "roads", "quarter" : "Q4", "total" : 60 }
{ "city" : "Yuma", "cat" : "roads", "quarter" : "Q2", "total" : 40 }
{ "city" : "Yuma", "cat" : "roads", "quarter" : "Q4", "total" : 20 }

      

You can alternately use mapReduce, which allows "some" flexibility with key names. The trick is that your aggregation is still "quarter", so you need this as part of the primary key, which cannot be changed once emitted.

Also, you can't "filter" any "0" aggregated results without a second pass after outputting to the collection, so it's not very useful for what you want to do, unless you can live with the second mapReduce of a "transform" query. in the output collection.



It is worth noting that if you look at what is being done in the "second" pipeline stage here with $project

and $map

, you will see that the structure of the document essentially changes to some extent, such as that you could alternately structure your documents like initially:

{
    "city" : "Reno", 
    "cat" : "parks"
    "totals" : [ 
        { "quarter" : "Q1", "total" : 35 }, 
        { "quarter" : "Q2", "total" : 0 }, 
        { "quarter" : "Q3", "total" : 0 }, 
        { "quarter" : "Q4", "total" : 0 }
    ]
},
{ 
    "city" : "Yuma", 
    "cat" : "parks"
    "totals" : [ 
        { "quarter" : "Q1", "total" : 0 }, 
        { "quarter" : "Q2", "total" : 0 }, 
        { "quarter" : "Q3", "total" : 45 }, 
        { "quarter" : "Q4", "total" : 0 } 
    ]
}

      

Then the aggregation operation becomes simple for your documents, with the same results as above:

db.collection.aggregate([
    { "$unwind": "$totals" },
    { "$group": {
        "_id": {
            "city": "$city",
            "cat": "$cat",
            "quarter": "$totals.quarter"
        },
        "ttotal": { "$sum": "$totals.total" }
    }},
    { "$match": { "ttotal": { "$ne": 0 } },
    { "$project": {
        "_id": 0,
        "city": "$_id.city",
        "cat": "$_id.cat",
        "quarter": "$_id.quarter",
        "total": "$ttotal"
    }}
])

      

So it might make more sense to consider structuring your documents in a way that starts with it and avoids any of the overhead required to transform the document.

I think you will find that consistent key names make a much better object model for programming, where you have to read the data point from the key value rather than from the key name. If you really need to, it's just a matter of reading data from an object and converting the keys of each already aggregated result into post-processing.

+2


source







All Articles