How to determine Lucene relevance / cutoff?

What is the best way to determine relevance and crop results for display?

So, the system I'm currently working on involves finding inventory and returning results. Each result must be checked by an employee to determine if it is true. Obviously, we want to minimize the number of false positives we return.

I've tweaked incentives and stuff to get the best result, but we still have a few issues with determining relevance.

An absolute threshold does not work because search scores are meaningful only in relation to the results in a given query. Thus, a score of 200 on one request may not be as significant as a score of .2 on another.

Another method I have seen is the score normalized to the upper score of the query. Then we can return all results within x% of that score. However, if there are no good results, then the top result is very low and all the results we return will be bad.

How do you determine which documents are relevant and which are not?

+3


source to share





All Articles