Lucene 6 - How to influence the numerical rating?

I'm new to Lucene so apologize for any obscure wording. I am working on an author search engine. The search term is the name of the author. The default search results are good - they return the names that match the most. However, we want to rank the results by author popularity, as well as a combination of both the default similarity and a numeric value representing the circulations that their titles have. The problem with the default results is that they return authors that nobody cares about, and although I can only evaluate one by type, the top result is usually not a great match in terms of name. I've been looking for days for a solution to this.

This is how I am building my index:

    IndexWriter writer = new IndexWriter(FSDirectory.open(Paths.get(INDEX_LOCATION)),
        new IndexWriterConfig(new StandardAnalyzer()));
    writer.deleteAll();
    for (Contributor contributor : contributors) {
        Document doc = new Document();
        doc.add(new TextField("name", contributor.getName(), Field.Store.YES));
        doc.add(new StoredField("contribId", contributor.getContribId()));
        doc.add(new NumericDocValuesField("sum", sum));
        writer.addDocument(doc);
    }
    writer.close();

      

The name is the field we want to search for, and the sum is the field we want to weigh our search results on (but still considering the best match for the author's name). I'm not sure if adding an amount to the document is the right thing to do in this situation. I know it will take some experimentation to figure out how to best balance the weight of the two factors, but my problem is that I don't know how to do it in the first place.

Any examples I could find are either pre-Lucene 4 or don't seem to work. I thought this one was what I was looking for, but it doesn't seem to work. Help rate!

+3


source to share


1 answer


As shown in the blog post CustomScoreQuery

you can use CustomScoreQuery

; This would give you a lot of flexibility and influence over the scoring process, but it is also a little overkill. Another possibility is to use FunctionScoreQuery

; since they behave differently, I will explain both.

Using FunctionScoreQuery

FunctionScoreQuery

can change the score based on the field.

Let's say you create that you usually search like this:

Query q = .... // pass the user input to the QueryParser or similar
TopDocs hits = searcher.search(query, 10); // Get 10 results

      

Then you can change the request between them like this:

Query q = .....

// Note that a Float field would work better.
DoubleValuesSource boostByField = DoubleValuesSource.fromLongField("sum");

// Create a query, based on the old query and the boost
FunctionScoreQuery modifiedQuery = new FunctionScoreQuery(q, boostByField);

// Search as usual
TopDocs hits = searcher.search(query, 10);

      

This will change the query based on the field value. Unfortunately, however, there is no way to control the impact DoubleValuesSource

(other than scaling the values ​​during indexing) - at least none that I know of.

To have more control, consider using CustomScoreQuery

.

Using CustomScoreQuery



Using this kind of query will allow you to change the grade of each result as you like. In this context, we will use it to change the score based on a field in the index. First, you will need to store your value during indexing:

doc.add(new StoredField("sum", sum)); 

      

Then we will need to create our own request class:

private static class MyScoreQuery extends CustomScoreQuery {
    public MyScoreQuery(Query subQuery) {
        super(subQuery);
    }

    // The CustomScoreProvider is what actually alters the score
    private class MyScoreProvider extends CustomScoreProvider {

        private LeafReader reader;
        private Set<String> fieldsToLoad;

        public MyScoreProvider(LeafReaderContext context) {
            super(context);
            reader = context.reader();

            // We create a HashSet which contains the name of the field
            // which we need. This allows us to retrieve the document 
            // with only this field loaded, which is a lot faster.
            fieldsToLoad = new HashSet<>();
            fieldsToLoad.add("sum");
        }

        @Override
        public float customScore(int doc_id, float currentScore, float valSrcScore) throws IOException {
            // Get the result document from the index
            Document doc = reader.document(doc_id, fieldsToLoad);

            // Get boost value from index               
            IndexableField field = doc.getField("sum");
            Number number = field.numericValue();

            // This is just an example on how to alter the current score
            // based on the value of "sum". You will have to experiment
            // here.
            float influence = 0.01f;
            float boost = number.floatValue() * influence;

            // Return the new score for this result, based on the 
            // original lucene score.
            return currentScore + boost;
        }           
    }

    // Make sure that our CustomScoreProvider is being used.
    @Override
    public CustomScoreProvider getCustomScoreProvider(LeafReaderContext context) {
        return new MyScoreProvider(context);
    }       
}

      

You can now use the new Query class to modify an existing query similar to FunctionScoreQuery

:

Query q = .....

// Create a query, based on the old query and the boost
MyScoreQuery modifiedQuery = new MyScoreQuery(q);

// Search as usual
TopDocs hits = searcher.search(query, 10);

      

Concluding remarks

By using CustomScoreQuery

, you can influence the assessment process in different ways. Remember, however, that the method customScore

is called for every search result - so do not perform expensive calculations there, as this will seriously slow down the search process.

I am creating a little gist of a complete working example CustomScoreQuery

here: https://gist.github.com/philippludwig/14e0d9b527a6522511ae79823adef73a

+2


source







All Articles