Aggregating concurrent events in ElasticSearch

I have a series of docs representing events with starts_at

and fields ends_at

. At the moment, the event is considered active if the point in question is after starts_at

and before ends_at

.

I'm looking for an aggregation that should result in a date histogram where each bucket contains the number of active events in that interval.

So far, the best approximation I have found is to create a set of buckets that count the number of runs in each interval, as well as a corresponding set of buckets that count the number of ends, and then post-process them by subtracting the number starts with the number of ends for each interval:

{
  "size": "0",
  "query": {
    "filtered": {
      "query": {
        "match_all": {}
      },
      "filter": {
        "and": [
          {
            "term": {
              "_type": "event"
            }
          },
          {
            "range": {
              "starts_at": {
                "gte": "2015-06-14T05:25:03Z",
                "lte": "2015-06-21T05:25:03Z"
              }
            }
          }
        ]
      }
    }
  },
  "aggs": {
    "starts": {
      "date_histogram": {
        "field": "starts_at",
        "interval": "15m",
        "extended_bounds": {
          "max": "2015-06-21T05:25:04Z",
          "min": "2015-06-14T05:25:04Z"
        },
        "min_doc_count": 0
      }
    },
    "ends": {
      "date_histogram": {
        "field": "ends_at",
        "interval": "15m",
        "extended_bounds": {
          "max": "2015-06-21T05:25:04Z",
          "min": "2015-06-14T05:25:04Z"
        },
        "min_doc_count": 0
      }
    }
  }
}

      

I am looking for something like this solution .

Is there a way to achieve this with a single request?

+3


source to share


1 answer


I'm not 100% sure, but later conveyor assemblies may solve this problem in the near future in a more elegant way.

In the meantime, you could choose your desired temporal resolution and index time in addition to the starts_at

and fields ends_at

, which you would also generate the field active_at

. This will be an array of timestamps, and you can use either terms (if displayed as not_analyzed string) or aggregating date_histogram to get the correct count of active events for each slave time.



The downside is increased storage requirements and possibly worse performance as there are more field values ​​to populate aggregates. It shouldn't be too bad anyway, as long as you don't choose too high a temporal resolution, for example 1 minute.

0


source







All Articles