ElasticSerach - Statistical Aspects By List Length
I have the following example of mappipng:
{ "book": { "properties": { "author": {"type": "string"}, "title": {"type": "string"}, "reviews": { "properties": { "url": {"type": "string"}, "score": {"type": "integer"} } }, "chapters": { "include_in_root": 1, "type": "nested", "properties": { "name": {"type": "string"} } } } } }
I would like to get a facet by the number of reviews - i.e. the length of the reviews array. For example, the results spoken to me orally: "100 documents with 10 reviews, 20 documents with 5 reviews, ..."
I am trying to do the following statistical aspect:
{ "query": { "match_all": {} }, "facets": { "stat1": { "statistical": {"script": "doc ['reviews.score']. values.size ()"} } } }
but it keeps failing:
{ "error": "SearchPhaseExecutionException [Failed to execute phase [query_fetch], total failure; shardFailures {[mDsNfjLhRIyPObaOcxQo2w] [facettest] [0]: QueryPhaseExecutionException [[facettest] [0]: cache [ConstantseScore ( NotDearchted .index.search.nested.NonNestedDocsFilter @ a2a598 4b)))], from [0], size [10]: Query Failed [Failed to execute main query]]; nested: PropertyAccessException [[Error: could not access: reviews; in class: org.elasticsearch.search.lookup.DocLookup] [Near: {... doc [reviews.score] .values.size () ....}] ^ [Line: 1, Column: 5]]; }] ", "status": 500 }
How can I achieve my goal?
ElasticSearch version is 0.19.9.
Here are my details:
{ "author": "Mark Twain", "title": "The Adventures of Tom Sawyer", "reviews": [ { "url": "amazon.com", "score": 10 }, { "url": "www.barnesandnoble.com", "score": 9 } ], "chapters": [ {"name": "Chapter 1"}, {"name": "Chapter 2"} ] } { "author": "Jack London", "title": "The Call of the Wild", "reviews": [ { "url": "amazon.com", "score": 8 }, { "url": "www.barnesandnoble.com", "score": 9 }, { "url": "www.books.com", "score": 5 } ], "chapters": [ {"name": "Chapter 1"}, {"name": "Chapter 2"} ] }
source to share
It looks like you are using curl to fulfill your request and this curl statement looks like curl localhost: 9200 / my-index / book -d '{....}'
The problem is that since you are using apostrophes to wrap the request body, you need to escape all the apostrophes it contains. So your script should become:
{"script" : "doc['\''reviews.score'\''].values.size()"}
or
{"script" : "doc[\"reviews.score"].values.size()"}
The second problem is that from your description it looks like you are looking for a histogram facet or area facet , but not for the statistical aspect. So, I would suggest trying something like this:
curl "localhost:9200/test-idx/book/_search?search_type=count&pretty" -d '{
"query" : {
"match_all" : {}
},
"facets" : {
"histo1" : {
"histogram" : {
"key_script" : "doc[\"reviews.score\"].values.size()",
"value_script" : "doc[\"reviews.score\"].values.size()",
"interval" : 1
}
}
}
}'
The third problem is that the script on the facet will be called for every single entry in the result list, and if you have a lot of results it can take a very long time. Therefore, I would suggest indexing an additional field under the title number_of_reviews
that should be populated with the number of reviews from your customer. Then your request would simply become:
curl "localhost:9200/test-idx/book/_search?search_type=count&pretty" -d '{
"query" : {
"match_all" : {}
},
"facets" : {
"histo1" : {
"histogram" : {
"field" : "number_of_reviews"
"interval" : 1
}
}
}
}'
source to share