How do I promote specific documents for a given search query in Elasticsearch?

I need your help creating an index for a real scenario. This may be a long question, let me try to explain it as briefly as possible.

We are building a search platform based on Elasticsearch to provide site search for our clients, the document index could be something like this:

{ "Path":"http://www.foo.com/doc/abc/1", "Title":"Title 1", "Description":"The description of doc 1", ... }
{ "Path":"http://www.foo.com/doc/abc/2", "Title":"Title 2", "Description":"The description of doc 2", ... }
{ "Path":"http://www.foo.com/doc/abc/3", "Title":"Title 3", "Description":"The description of doc 3", ... }
...

      

For each request, the returned deleted documents are sorted by relevance by default, but our client also wants to boost some specific documents for some keywords.

They give us the following, how to force the XML configuration:

<boost>
    <Keywords value="keyword1">
        <Path rank="10000">http://www.foo.com/doc/abc/1</Path>
    </Keywords>

    <Keywords value="keyword2">
        <Path rank="10000">http://www.foo.com/doc/abc/2</Path>
        <Path rank="9900">http://www.foo.com/doc/abc/1</Path>
    </Keywords>

    <Keywords value="keyword3">
        <Path rank="10000">http://www.foo.com/doc/abc/3</Path>
        <Path rank="9900">http://www.foo.com/doc/abc/2</Path>
        <Path rank="9800">http://www.foo.com/doc/abc/1</Path>
    </Keywords>
</boost>

      

This means that if the user searches for "keyword1", then the first document, which should be the first, should be the document whose field value is " www.foo.com/doc/abc/1 ", regardless of whether the relevance rating of this document ... Similarly, if you search for "keyword3", then the top 3 documents include documents whose values ​​are " www.foo.com/doc/abc/3 ", " www.foo.com/doc/abc/2 " and " www.foo. com / doc / abc / 1 "respectively.

To meet this special requirement, my design first inverts the original XML formatting in the following format:

<boost>
    <Path value="http://www.foo.com/doc/abc/1">
        <keywords>
           <keyword value="keyword1" rank="10000" />
           <keyword value="keyword2" rank="9900" />
           <keyword value="keyword3" rank="9800" />
        </keywords>
    </Path>

    <Path value="http://www.foo.com/doc/abc/2">
        <keywords>
           <keyword value="keyword2" rank="10000" />
           <keyword value="keyword3" rank=9900" />
        </keywords>
    </Path> 
    <Path value="http://www.foo.com/doc/abc/3">
        <keywords>
           <keyword value="keyword3" rank="10000" />
        </keywords>
    </Path>
</boost>   

      

Then add a nested "Boost" field that contains an array of keyword / rank fields to your Elasticsearch document in the following example:

{
  "Boost": [ 
     { "keyword":"keyword1", "rank": 10000},
     { "keyword":"keyword2", "rank": 9900},
     { "keyword":"keyword3", "rank": 9800}
  ] 
  "Path":"http://www.foo.com/doc/abc/1", 
  "Title":"Title 1", 
  "Description":"The description of doc 1",
   ...
 }

{
    "Boost": [ 
       { "keyword":"keyword2", "rank": 10000},
       { "keyword":"keyword3", "rank": 9900}
    ] 
    "Path":"http://www.foo.com/doc/abc/2", 
    "Title":"Title 2", 
    "Description":"The description of doc 2",
     ...
 }

{

    "Boost": [ 
       { "keyword":"keyword3", "rank": 10000}
    ] 
    "Path":"http://www.foo.com/doc/abc/3", 
    "Title":"Title 3", 
    "Description":"The description of doc 3",
     ...
}

      

Then, at query time, use a subquery to get the rank value for each matched document for a given search keyword, and then use the script score to adjust the relevance score with that rank value.

Since the rank value from the XML boost is much greater than the normal relevance score (typically less than 5), the adjusted score for documents that are configured to boost XML for a given keyword should be better.

Do you think this is a good design in Elasticsearch? Any suggestions for better approaches?

Thanks in advance!

+3


source to share


1 answer


It may be better to index the keywords in a separate field from the original documents, and then simply increase the match in that field during the search.

This is not exactly what you described as it doesn't give you fine control over the boost factor for each keyword. But this is definitely a way to get certain documents higher in the search results if the query contains certain keywords.

If you really want better boost factor control for different keywords, you can still do it using this method. But you will need to create several "promoted keywords" fields and increase them differently in your query.

For example:



{ "Path":"http://www.foo.com/doc/abc/1",
  "Title":"Title 1",
  "Description":"The description of doc 1",
  "boost_kw1": "keyword1 keyword2",
  "boost_kw2": "keyword3 keyword4" },
{ "Path":"http://www.foo.com/doc/abc/1",
  "Title":"Title 1",
  "Description":"The description of doc 1",
  "boost_kw1": "keyword3",
  "boost_kw2": "keyword1 keyword2" }

      

And in the query, you will calculate the total score as the sum:

  • main scire query
  • match score in "boost_kw1" multiplied by 10
  • match score in "boost_kw2" multiplied by 5
+1


source







All Articles