Aggregating multipoint histogram on elasticsearch Java API

I am using dateHistogram aggregation with Java ElasticSearch API and it works great for simple aggregations like hits per hour / day / month / year (imagine a series of docs where date bar graph aggregation is done in the "indexed_date" field).

But can I create a multithreaded date aggregation with respect to another field with one query? Something like what Kibana does for the cards.

An example of what I would like to achieve:

I have a series of documents where each one represents an "event" that has its own timestamp. These documents have a number of fields such as "status", "version", etc.

Can I get an aggregation based on a date histogram on a timestamp field and on all values โ€‹โ€‹of another field?

An example of the aggregation result with an interval of one hour:

H: 12 state of affairs - {ACTIVE: 34 DESIGNATED: 12}

H: 13 state of affairs - {ACTIVE: 10}

EDIT:

Some examples of data:

"doc1" - { timestamp: "2014-12-23 12:01", status: "ACTIVE", version: 1 }
"doc2" - { timestamp: "2014-12-23 12.15", status: "PAUSED", version: 1 }
"doc3" - { timestamp: "2014-12-23 13.55", status: "ACTIVE", version: 2 }
(and so on..)

      

+3


source to share


2 answers


I would do an aggregation of the term in a date histogram.

in the example below, you can see the number of documents returned for each other type of status:



curl -XGET 'http://localhost:9200/myindex/mydata/_search?search_type=count&pretty' -d '
> {
>  "query" : {
>     "match_all" : { } 
>   },
>     "aggs" : {
>         "date_hist_agg" : {
>             "date_histogram" : {"field" : "timestamp", "interval" : "hour"},
>             "aggs" : {
>              "status_agg" : {
>                 "terms" : { "field" : "status" }
>             }
>           }
>        }     
>      }
> }'
{
  "took" : 213,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "date_hist_agg" : {
      "buckets" : [ {
        "key_as_string" : "2014-12-23T17:00:00.000Z",
        "key" : 1419354000000,
        "doc_count" : 2,
        "status_agg" : {
          "doc_count_error_upper_bound" : 0,
          "sum_other_doc_count" : 0,
          "buckets" : [ {
            "key" : "active",
            "doc_count" : 1
          }, {
            "key" : "paused",
            "doc_count" : 1
          } ]
        }
      }, {
        "key_as_string" : "2014-12-23T18:00:00.000Z",
        "key" : 1419357600000,
        "doc_count" : 1,
        "status_agg" : {
          "doc_count_error_upper_bound" : 0,
          "sum_other_doc_count" : 0,
          "buckets" : [ {
            "key" : "active",
            "doc_count" : 1
          } ]
        }
      } ]
    }
  }
}

      

+3


source


Using the same aggregation names used in the previous answer, I would do the following:



    public void yourSearch(String indexName, String typeName) {

        SearchResponse sr =  client.prepareSearch(indexName)
                .setTypes(typeName)
                .addAggregation(AggregationBuilders.dateHistogram("date_hist_agg")
                                .field("timestamp")
                                .interval(DateHistogram.Interval.hours((1)))
                                .minDocCount(0)
                        .subAggregation(AggregationBuilders.terms("status_agg").field("status")))
            .execute().actionGet();

        DateHistogram componentsAgg =  sr.getAggregations().get("date_hist_agg");
        for (DateHistogram.Bucket entry : componentsAgg.getBuckets()) {

            Terms statusAgg =  entry.getAggregations().get("status_agg");
            for (Terms.Bucket entry2 : statusAgg.getBuckets()) {
                String key = entry2.getKey();
                long cnt = entry2.getDocCount();

                // use the key,cnt

            }
        }
    }
}

      

0


source







All Articles