ElasticSerach - Statistical Aspects By List Length

I have the following example of mappipng:

{
    "book": {
        "properties": {
                        "author": {"type": "string"},
                        "title": {"type": "string"},
                        "reviews": {
                                "properties": {
                                        "url": {"type": "string"},
                                        "score": {"type": "integer"}
                                }
                        },
                        "chapters": {
                                "include_in_root": 1,
                                "type": "nested",
                                "properties": {
                                        "name": {"type": "string"}
                                }
                        }
                }
        }
}

I would like to get a facet by the number of reviews - i.e. the length of the reviews array. For example, the results spoken to me orally: "100 documents with 10 reviews, 20 documents with 5 reviews, ..."

I am trying to do the following statistical aspect:

{
    "query": {
        "match_all": {}
    },
    "facets": {
        "stat1": {
            "statistical": {"script": "doc ['reviews.score']. values.size ()"}
        }
    }
}

but it keeps failing:

{
  "error": "SearchPhaseExecutionException [Failed to execute phase [query_fetch], total failure; shardFailures {[mDsNfjLhRIyPObaOcxQo2w] [facettest] [0]: QueryPhaseExecutionException [[facettest] [0]: cache [ConstantseScore ( NotDearchted .index.search.nested.NonNestedDocsFilter @ a2a598 4b)))], from [0], size [10]: Query Failed [Failed to execute main query]]; nested: PropertyAccessException [[Error: could not access: reviews; in class: org.elasticsearch.search.lookup.DocLookup]
[Near: {... doc [reviews.score] .values.size () ....}]
                 ^
[Line: 1, Column: 5]]; }] ",
  "status": 500
}

How can I achieve my goal?

ElasticSearch version is 0.19.9.

Here are my details:

{
        "author": "Mark Twain",
        "title": "The Adventures of Tom Sawyer",
        "reviews": [
                {
                        "url": "amazon.com",
                        "score": 10
                },
                {
                        "url": "www.barnesandnoble.com",
                        "score": 9
                }
        ],
        "chapters": [
                {"name": "Chapter 1"}, {"name": "Chapter 2"}
        ]
}

{
        "author": "Jack London",
        "title": "The Call of the Wild",
        "reviews": [
                {
                        "url": "amazon.com",
                        "score": 8
                },
                {
                        "url": "www.barnesandnoble.com",
                        "score": 9
                },
                {
                        "url": "www.books.com",
                        "score": 5
                }
        ],
        "chapters": [
                {"name": "Chapter 1"}, {"name": "Chapter 2"}
        ]
}
+3


source to share


1 answer


It looks like you are using curl to fulfill your request and this curl statement looks like curl localhost: 9200 / my-index / book -d '{....}'

The problem is that since you are using apostrophes to wrap the request body, you need to escape all the apostrophes it contains. So your script should become:

{"script" : "doc['\''reviews.score'\''].values.size()"}

      

or

{"script" : "doc[\"reviews.score"].values.size()"}

      



The second problem is that from your description it looks like you are looking for a histogram facet or area facet , but not for the statistical aspect. So, I would suggest trying something like this:

curl "localhost:9200/test-idx/book/_search?search_type=count&pretty" -d '{
    "query" : {
        "match_all" : {}
    },
    "facets" : {
        "histo1" : {
            "histogram" : {
                "key_script" : "doc[\"reviews.score\"].values.size()",
                "value_script" : "doc[\"reviews.score\"].values.size()",
                "interval" : 1
            }
        }        
    }
}'

      

The third problem is that the script on the facet will be called for every single entry in the result list, and if you have a lot of results it can take a very long time. Therefore, I would suggest indexing an additional field under the title number_of_reviews

that should be populated with the number of reviews from your customer. Then your request would simply become:

curl "localhost:9200/test-idx/book/_search?search_type=count&pretty" -d '{
    "query" : {
        "match_all" : {}
    },
    "facets" : {
        "histo1" : {
            "histogram" : {
                "field" : "number_of_reviews"
                "interval" : 1
            }
        }        
    }
}'

      

+6


source







All Articles