Aggregating subgroups in order in elasticsearch
I cannot find the correct syntax to get the aggregation of the subordinate object ordered by the count field.
A good example of this is the twitter doc:
{
"properties" : {
"id" : {
"type" : "long"
},
"message" : {
"type" : "string"
},
"user" : {
"type" : "object",
"properties" : {
"id" : {
"type" : "long"
},
"screenName" : {
"type" : "string"
},
"followers" : {
"type" : "long"
}
}
}
}
}
How can I get Top Influencers for a given set of tweets? This will be a unique list of the top 10 "custom" objects, ordered by the "user.followers" field.
I've tried using top_hits, but I get an exception:
org.elasticsearch.common.breaker.CircuitBreakingException: [FIELDDATA] Data too large for [user.id]
"aggs": {
"top-influencers": {
"terms": {
"field": "user.id",
"order": {
"top_hit": "desc"
}
},
"aggs": {
"top_tags_hits": {
"top_hits": {}
},
"top_hit": {
"max": {
"field": "user.followers"
}
}
}
}
}
I can get pretty much what I want using the "sort" field in the query (no aggregation), however, if the user has multiple tweets, they will appear twice in the result. I need to be able to group the user sub-object and only return each user once.
--- --- UPDATE
I was able to get a list of the best returning users in a very good time. Unfortunately, this is not yet unique. Also, the docs say top_hits is designed as sub agg ... I use it as top level agg ...
"aggs": {
"top_influencers": {
"top_hits": {
"sort": [
{
"user.followers": {
"order": "desc"
}
}
],
"_source": {
"include": [
"user.id",
"user.screenName",
"user.followers"
]
},
"size": 10
}
}
}
source to share
Try it:
{
"aggs": {
"GroupByType": {
"terms": {
"field": "user.id",
"size": 10000
},
"aggs": {
"Group": {
"top_hits":{
"size":1,
"_source": {
"includes": ["user.id", "user.screenName", "user.followers"]
},
"sort":[{
"user.followers": {
"order": "desc"
}
}]
}
}
}
}
}
}
Then you can take the top 10 results from that query. Note that a regular search in elastic search only goes up to 10,000 records.
source to share