Elasticsearch: can aggregation results be processed?
I am calculating the duration of my service processes using SUM-Aggregation. Each step of the completed process will be saved in Elasticsearch under the call id.
This is what I control:
Duration of Request-Processing for ID #123 (calling service #1)
Duration of Server-Response for ID #123 (calling service #1)
**Complete Duration for ID #123**
Duration of Request-Processing for ID #124 (calling service #1)
Duration of Server-Response for ID #124 (calling service #1)
**Complete duration for ID #124**
Filter:
{
"from" : 0, "size" :0,
"query" : {
"filtered" : {
"query" : { "match_all" : {}},
"filter" : {
"term" : {
"callingId" : "123",
}
}
}
},
"aggs" : {
"total_duration" : { "sum" : { "field" : "duration" } },
"max_duration":{"max": {"field":"duration"}},
"min_duration":{"min":{"field":"duration"}}
}
}
}
This returns the total duration of the process and also tells me which part of the process was fastest and which part was slowest.
Next, I want to calculate the average duration of all finished processes by serviceId. In this case, I only care about the total duration for each service, so I can copy them.
How do I create the average, minimum and maximum values โโfrom my total_durations?
EDIT: I've added some sample data, I hope you can work with it.
call1:
{
"callerId":"U1",
"operation":"Initialize",
"status":"INITIALIZED",
"duration":1,
"serviceId":"1"
}
{
"callerId":"U1",
"operation":"Calculate",
"status":"STARTED",
"duration":1,
"serviceId":"1"
}
{
"callerId":"U1",
"operation":"Finish",
"status":"FINISHED",
"duration":1200,
"serviceId":"1"
}
sum: 1202
challenge 2:
{
"callerId":"U2",
"operation":"Initialize",
"status":"INITIALIZED",
"duration":2,
"serviceId":"1"
}
{
"callerId":"U2",
"operation":"Calculate",
"status":"STARTED",
"duration":1,
"serviceId":"1"
}
{
"callerId":"U2",
"operation":"Finish",
"status":"FINISHED",
"duration":1030,
"serviceId":"1"
}
sum: 1033
Aggregation for all service calls for service ID # 1 This is what I want to calculate:
Max: 1202
Min: 1033
AVG: 1116
source to share
A little more complicated, but here it goes (only in 1.4 due to this type of aggregation ):
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"term": {
"serviceId": 1
}
}
}
},
"aggs": {
"executionTimes": {
"scripted_metric": {
"init_script": "_agg['values'] = new java.util.HashMap();",
"map_script": "if (_agg.values[doc['callerId'].value]==null) {_agg.values[doc['callerId'].value]=doc['duration'].value;} else {_agg.values[doc['callerId'].value].add(doc['duration'].value);}",
"combine_script":"someHashMap = new java.util.HashMap();for(x in _agg.values.keySet()) {value=_agg.values[x]; sum=0; for(y in value) {sum+=y}; someHashMap.put(x,sum)}; return someHashMap;",
"reduce_script": "finalArray = []; finalMap = new java.util.HashMap(); for(map in _aggs){for(x in map.keySet()){if(finalMap.containsKey(x)){value=finalMap.get(x);finalMap.put(x,value+map.get(x));} else {finalMap.put(x,map.get(x))}}}; finalAvgValue=0; finalMaxValue=-1; finalMinValue=-1; for(key in finalMap.keySet()){currentValue=finalMap.get(key);finalAvgValue+=currentValue; if(finalMinValue<0){finalMinValue=currentValue} else if(finalMinValue>currentValue){finalMinValue=currentValue}; if(currentValue>finalMaxValue) {finalMaxValue=currentValue}}; finalArray.add(finalMaxValue); finalArray.add(finalMinValue); finalArray.add(finalAvgValue/finalMap.size()); return finalArray",
"lang": "groovy"
}
}
}
}
Also, I'm not saying this is the best approach, but only one that I could find. Also, I am not saying that the solution is at its best. Perhaps it can be refined and improved. I wanted to show, however, that this is possible. However, keep in mind that it is available in version 1.4.
The main idea of โโthe approach is to use scripts to build a data structure that should contain the information you need, calculated in various steps according to the scripted aggregation metric . Also, the aggregation is done for only one serviceId
. If you want to do this for all serviceIds, I think you might need to think a little about the data structure in scripts.
For the request above and for the exact data you provided, these are:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 6,
"max_score": 0,
"hits": []
},
"aggregations": {
"executionTimes": {
"value": [
1202,
1033,
"1117.5"
]
}
}
}
The order of the values โโin the array value
is [max, min, avg], according to the script in reduce_script
.
source to share
In the next version 2.0.0 there will be a new feature called Reducers . Gearboxes allow you to calculate aggregations by units.
Related posts: https://github.com/elasticsearch/elasticsearch/issues/8110
source to share