Aggregating subgroups in order in elasticsearch

I cannot find the correct syntax to get the aggregation of the subordinate object ordered by the count field.

A good example of this is the twitter doc:

{
  "properties" : {
    "id" : {
      "type" : "long"
    },
    "message" : {
      "type" : "string"
    },
    "user" : {
      "type" : "object",
      "properties" : {
        "id" : {
          "type" : "long"
        },
        "screenName" : {
          "type" : "string"
        },
        "followers" : {
          "type" : "long"
        }
      }
    }
  }
}

      

How can I get Top Influencers for a given set of tweets? This will be a unique list of the top 10 "custom" objects, ordered by the "user.followers" field.

I've tried using top_hits, but I get an exception:

org.elasticsearch.common.breaker.CircuitBreakingException: [FIELDDATA] Data too large for [user.id]

"aggs": {
    "top-influencers": {
      "terms": {
        "field": "user.id",

        "order": {
          "top_hit": "desc"
        }
      },
      "aggs": {
        "top_tags_hits": {
          "top_hits": {}
        },
        "top_hit": {
          "max": {
            "field": "user.followers"
          }
        }
      }
    }
  }

      

I can get pretty much what I want using the "sort" field in the query (no aggregation), however, if the user has multiple tweets, they will appear twice in the result. I need to be able to group the user sub-object and only return each user once.

--- --- UPDATE

I was able to get a list of the best returning users in a very good time. Unfortunately, this is not yet unique. Also, the docs say top_hits is designed as sub agg ... I use it as top level agg ...

"aggs": {
    "top_influencers": {
      "top_hits": {
        "sort": [
          {
            "user.followers": {
              "order": "desc"
            }
          }
        ],
        "_source": {
          "include": [
            "user.id",
            "user.screenName",
            "user.followers"
          ]
        },
        "size": 10
      }
    }
  }

      

+4


source to share


1 answer


Try it:

{
    "aggs": {
        "GroupByType": {
            "terms": {
                "field": "user.id",
                "size": 10000
            },
            "aggs": {
                "Group": {
                    "top_hits":{
                        "size":1, 
                        "_source": {
                                "includes": ["user.id", "user.screenName", "user.followers"]
                        },
                        "sort":[{
                            "user.followers": {
                                "order": "desc"
                            }
                        }]

                     }
                }
            }
        }
    }
}

      



Then you can take the top 10 results from that query. Note that a regular search in elastic search only goes up to 10,000 records.

0


source







All Articles