Aggregating multipoint histogram on elasticsearch Java API
I am using dateHistogram aggregation with Java ElasticSearch API and it works great for simple aggregations like hits per hour / day / month / year (imagine a series of docs where date bar graph aggregation is done in the "indexed_date" field).
But can I create a multithreaded date aggregation with respect to another field with one query? Something like what Kibana does for the cards.
An example of what I would like to achieve:
I have a series of documents where each one represents an "event" that has its own timestamp. These documents have a number of fields such as "status", "version", etc.
Can I get an aggregation based on a date histogram on a timestamp field and on all values โโof another field?
An example of the aggregation result with an interval of one hour:
H: 12 state of affairs - {ACTIVE: 34 DESIGNATED: 12}
H: 13 state of affairs - {ACTIVE: 10}
EDIT:
Some examples of data:
"doc1" - { timestamp: "2014-12-23 12:01", status: "ACTIVE", version: 1 }
"doc2" - { timestamp: "2014-12-23 12.15", status: "PAUSED", version: 1 }
"doc3" - { timestamp: "2014-12-23 13.55", status: "ACTIVE", version: 2 }
(and so on..)
source to share
I would do an aggregation of the term in a date histogram.
in the example below, you can see the number of documents returned for each other type of status:
curl -XGET 'http://localhost:9200/myindex/mydata/_search?search_type=count&pretty' -d '
> {
> "query" : {
> "match_all" : { }
> },
> "aggs" : {
> "date_hist_agg" : {
> "date_histogram" : {"field" : "timestamp", "interval" : "hour"},
> "aggs" : {
> "status_agg" : {
> "terms" : { "field" : "status" }
> }
> }
> }
> }
> }'
{
"took" : 213,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"date_hist_agg" : {
"buckets" : [ {
"key_as_string" : "2014-12-23T17:00:00.000Z",
"key" : 1419354000000,
"doc_count" : 2,
"status_agg" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ {
"key" : "active",
"doc_count" : 1
}, {
"key" : "paused",
"doc_count" : 1
} ]
}
}, {
"key_as_string" : "2014-12-23T18:00:00.000Z",
"key" : 1419357600000,
"doc_count" : 1,
"status_agg" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ {
"key" : "active",
"doc_count" : 1
} ]
}
} ]
}
}
}
source to share
Using the same aggregation names used in the previous answer, I would do the following:
public void yourSearch(String indexName, String typeName) {
SearchResponse sr = client.prepareSearch(indexName)
.setTypes(typeName)
.addAggregation(AggregationBuilders.dateHistogram("date_hist_agg")
.field("timestamp")
.interval(DateHistogram.Interval.hours((1)))
.minDocCount(0)
.subAggregation(AggregationBuilders.terms("status_agg").field("status")))
.execute().actionGet();
DateHistogram componentsAgg = sr.getAggregations().get("date_hist_agg");
for (DateHistogram.Bucket entry : componentsAgg.getBuckets()) {
Terms statusAgg = entry.getAggregations().get("status_agg");
for (Terms.Bucket entry2 : statusAgg.getBuckets()) {
String key = entry2.getKey();
long cnt = entry2.getDocCount();
// use the key,cnt
}
}
}
}
source to share