Aggregating concurrent events in ElasticSearch

Question

Aggregating concurrent events in ElasticSearch

I have a series of docs representing events with starts_at

and fields ends_at

. At the moment, the event is considered active if the point in question is after starts_at

and before ends_at

.

I'm looking for an aggregation that should result in a date histogram where each bucket contains the number of active events in that interval.

So far, the best approximation I have found is to create a set of buckets that count the number of runs in each interval, as well as a corresponding set of buckets that count the number of ends, and then post-process them by subtracting the number starts with the number of ends for each interval:

{
  "size": "0",
  "query": {
    "filtered": {
      "query": {
        "match_all": {}
      },
      "filter": {
        "and": [
          {
            "term": {
              "_type": "event"
            }
          },
          {
            "range": {
              "starts_at": {
                "gte": "2015-06-14T05:25:03Z",
                "lte": "2015-06-21T05:25:03Z"
              }
            }
          }
        ]
      }
    }
  },
  "aggs": {
    "starts": {
      "date_histogram": {
        "field": "starts_at",
        "interval": "15m",
        "extended_bounds": {
          "max": "2015-06-21T05:25:04Z",
          "min": "2015-06-14T05:25:04Z"
        },
        "min_doc_count": 0
      }
    },
    "ends": {
      "date_histogram": {
        "field": "ends_at",
        "interval": "15m",
        "extended_bounds": {
          "max": "2015-06-21T05:25:04Z",
          "min": "2015-06-14T05:25:04Z"
        },
        "min_doc_count": 0
      }
    }
  }
}

I am looking for something like this solution .

Is there a way to achieve this with a single request?

+3

elasticsearch

Peter Hübel 10 jul. 15 at 11:38

source to share

1 answer

NikoNyrh · Answer 1 · 2015-09-28T14:45:03+0000

I'm not 100% sure, but later conveyor assemblies may solve this problem in the near future in a more elegant way.

In the meantime, you could choose your desired temporal resolution and index time in addition to the starts_at

and fields ends_at

, which you would also generate the field active_at

. This will be an array of timestamps, and you can use either terms (if displayed as not_analyzed string) or aggregating date_histogram to get the correct count of active events for each slave time.

The downside is increased storage requirements and possibly worse performance as there are more field values to populate aggregates. It shouldn't be too bad anyway, as long as you don't choose too high a temporal resolution, for example 1 minute.

Aggregating concurrent events in ElasticSearch

More articles: