Elasticsearch increases score with nested queries

I have the following request in Elasticsearch version 1.3.4:

{
   "filtered": {
      "query": {
         "bool": {
            "should": [
               {
                  "bool": {
                     "should": [
                        {
                           "match_phrase": {
                              "_all": "java"
                           }
                        },
                        {
                           "bool": {
                              "should": [
                                 {
                                    "match_phrase": {
                                       "_all": "adobe creative suite"
                                    }
                                 }
                              ]
                           }
                        }
                     ]
                  }
               },
               {
                  "bool": {
                     "should": [
                        {
                           "nested": {
                              "path": "skills",
                              "query": {
                                 "bool": {
                                    "must": [
                                       {
                                          "term": {
                                             "skills.name.original": "java"
                                          }
                                       },
                                       {
                                          "bool": {
                                             "should": [
                                                {
                                                   "match": {
                                                      "skills.source": {
                                                         "query": "linkedin",
                                                         "boost": 5
                                                      }
                                                   }
                                                }, 
                                                {
                                                   "match": {
                                                      "skills.source": {
                                                         "query": "meetup",
                                                         "boost": 5
                                                      }
                                                   }
                                                }                                                
                                             ]
                                          }
                                       }
                                    ],
                                    "minimum_should_match": "100%"
                                 }
                              }
                           }
                        }
                     ]
                  }
               }
            ],
            "minimum_should_match": "100%"
         }
      },
      "filter": {
         "and": [
            {
               "bool": {
                  "should": [
                     {
                        "term": {
                           "skills.name.original": "java"
                        }
                     }
                  ]
               }
            },
            {
               "bool": {
                  "should": [
                     {
                        "term": {
                           "skills.name.original": "ajax"
                        }
                     },
                     {
                        "term": {
                           "skills.name.original": "html"
                        }
                     }
                  ]
               }
            }
         ]
      }
   }
}

      

The mappings look like this:

  skills: {
    type: "nested", 
    include_in_parent: true, 
    properties: {                 
      name: {
        type: "multi_field",
        fields: {
          name: {type: "string"},
          original: {type : "string", analyzer : "string_lowercase"} 
        }              
      }                                                       
    }
  }

      

and finally the document structure for skills (excluding other parts) is as follows:

  "skills": 
  [
    {
      "name": "java",
      "source": [
         "linkedin", 
         "facebook"
      ]
    },
    {
      "name": "html",
      "source": [
         "meetup"
      ]
    }
  ]

      

My goal with this query is to first filter out some irrelevant filter hits (at the bottom of the query) and then hammer the person by searching the whole document for match_phrase "java", further boost if it also contains match_phrase "adobe creative suit" , then check the nested value where we get hit in "skills" to see which "source (s)" the skill came from. Then give the request a boost based on the source or source of the nested object.

This kind of works, at least I don't get any errors, but the final result is odd and it's hard to see if it works. If I give a little momentum, say 2, the score goes down a bit, my top hit at the moment has a score of 32.176407 with boost = 1. With a rise of 5, it went down to 31.637703. Would I expect it to go up and not down? With an increase of 1000, the account drops to 2.433376.

Is this the correct way to do it, or is there a better / easier way? I could change the structure and displays, etc. And why is my score decreasing?

Edit: I've simplified the query a bit by only dealing with one "skill":

{
   "filtered": {
      "query": {
         "bool": {
            "must": [
               {
                  "bool": {
                     "must": [
                        {
                           "bool": {
                              "should": [
                                 {
                                    "match_phrase": {
                                       "_all": "java"
                                    }
                                 }
                              ],
                              "minimum_should_match": 1
                           }
                        }
                     ]
                  }
               }
            ],
            "should": [
               {
                  "nested": {
                     "path": "skills",
                     "score_mode": "avg",
                     "query": {
                        "bool": {
                           "must": [
                              {
                                 "term": {
                                    "skills.name.original": "java"
                                 }
                              }
                           ],
                           "should": [
                              {
                                 "match": {
                                    "skills.source": {
                                       "query": "linkedin",
                                       "boost": 1.2
                                    }
                                 }
                              },
                              {
                                 "match": {
                                    "skills.source": {
                                       "query": "meetup",
                                       "boost": 1.2
                                    }
                                 }
                              }
                           ]
                        }
                     }
                  }
               }
            ]
         }
      },
      "filter": {
         "and": [
            {
               "bool": {
                  "should": [
                     {
                        "term": {
                           "skills.name.original": "java"
                        }
                     }
                  ]
               }
            }
         ]
      }
   }
}

      

Now the problem is that I am expecting two similar documents where the only difference is the "original" value of the "java" skill. They are "linkedin" and "meetup" respectively. In my new request, they get the same boost, but the final _score is very different for the two documents.

From the explanation of the request for doc 1:

"value": 3.82485,
"description": "Score based on child doc range from 0 to 125"

      

and for doc two:

"value": 2.1993546,
"description": "Score based on child doc range from 0 to 125"

      

These values ​​are the only ones that differ from each other and I cannot figure out why.

+3


source to share


1 answer


I can't answer the question about the boost, but how many debris do you have on the index? TF and IDF are calculated per shard, not index, and this can create a difference in points. https://groups.google.com/forum/#!topic/elasticsearch/FK-PYb43zcQ .

If you re-index with only 1 shard, does that change the result?



Edit: Also, a document range is a document range for each document in the shard, and you can use it to calculate the IDF for each document to check the results.

0


source







All Articles