Elasticsearch array count result
I am using elasticsearch to search multiple fields of an array in my type that looks something like
t1 = { field1: ["foo", "bar"],
field2: ["foo", "foo", "foo", "foo"]
field3: ["foo", "foo", "foo", "foo", "foo", "foo"]
}
And then I use a multi_match request to get matches, something like
multi_match: { query: "foo",
fields: "fields*"
}
When calculating the t1 score, elasticsearch adds the query count to field1, field2 and field3, which is what I want. However, they don't contribute the same way, field3 contributes more to the score since "foo" happens multiple times there.
Now I want to compute the score in each field of the array without adding the count of all the entries in the array, but simply taking the maximum of them. In my example, all fields would contain the same score, since they all have the same exact match.
This question has already been asked on the elasticsearch forum but has not been answered yet.
source to share
I myself was stumped, it seems to me that there should be a simple, built-in way to just specify max instead of sum.
Not sure if this is exactly what you are going to do because you are losing the match on every particular element of the array. So you don't get the maximum match score for the very best particular item, just a boolean if anything matches. If it's something a little more nuanced (say, the full name of the person where you want a better match for the first and last versus one or the other), it might not be acceptable because you're throwing your grades away.
If that's acceptable, this workaround seems to work:
{function_score: {
query: {bool: {should: [
{term: {field1: 'foo'}},
{term: {field2: 'foo'}},
{term: {field3: 'foo'}},
]}},
functions: [
{filter: {term: {field1: 'foo'}}, weight: 1},
{filter: {term: {field2: 'foo'}}, weight: 1},
{filter: {term: {field2: 'foo'}}, weight: 1},
],
score_mode: 'sum',
boost_mode: 'replace',
}}
We need part of the query to give us results to filter further, even if we drop the count. It looks like it really is a filter, but just wrapping the same question in a query filtered
doesn't work. There might be a better option here.
The functions then weight
just basically give 1 if there is a match in that field and 0 otherwise. score_mode
tells it to sum those weights, so in your case they all match, so we get 3. boost_mode
indicates how to match the original query, "replace" tells it to ignore the original query score (which has the problem you mentioned, multiple matches are summed in array). So the overall score for this query is 3 because there are 3 matches.
It seems more complicated to me, but in my relatively limited testing, I haven't noticed any performance issues or anything else. I'd love to see a better answer if anyone familiar with elasticsearch has one.
source to share