Autocomplete feature using Solr4 on multivalued fields

I've seen posts about doing autocomplete on multiple fields, but not about doing autocomplete on multivalued fields.

My autocomplete function works for non-multivalued fields.

My problem is that when I run a query on a multivalued field, wherever a document matches that query, all fields in that document's multivalued field are returned in the facet results.

Below is my schematic, similar to the one suggested in the Cookbook Solr 4.

 <fieldType name="text_autocomplete" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="25" />
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

<field name="publisherText-str" type="string" indexed="true" stored="false" multiValued="true"/>
<field name="publisherText-ac" type="text_autocomplete" indexed="true" stored="true" required="false" multiValued="true"/>

      

As you can see, publisherText is a multivalued field. I am running a query like this to test the autocomplete feature:

/select?q=publisherText-ac:new&facet=true&facet.field=publisherText-str&facet.mincount=1&rows=0

      

The request is "new" and it matches the set of documents. However, the result set of facets contains different publisherText values ​​(contained in the multivalued field) for each corresponding document.

Update . When requesting a "new", the result set should include "New York Times" and "Times New Roman", but there is no need to solve the infix problem: "Knewton Gazette" should not be in the result set.

Is there a way to make the facet result only contain the values ​​that match the query? Or is there another (better?) Way to support the full autocomplete feature that handles fields with multiple values ​​more intelligently?

Thank.

+3


source to share


3 answers


I think the most optimal way would be to create a separate collection or core (depending on whether you are using the cloud or not) and index your data in such a way that it can be queries for the desired query result. Of course, this may not be possible in some cases, but if this is your case, go for it. In such a kernel, you will have fields and data related to your autocomplete, so in most cases it will be smaller than the original kernel, fewer terms, and this will lead to faster queries. In addition to this, such a core or collection is optimized for query autocompletion, and you will get even better performance.



However, if you cannot use the multi-core / collection approach, then allocation might be the best way to go if you need filtering. In such a case, you might want to enable terminal functions and use FastVectorHighlighting to improve Solr highlighting performance ( http://solr.pl/en/2011/06/13/solr-3-1-fastvectorhighlighting/ ).

+3


source


I used these two ways:

(A) stick with using facets and accept that you need to shrink the result with a regex or String.startsWith. It might not be so bad if you use front-end components like the Autocomplete YUI3 plugin that offers this feature without needing much.

(B) use highlighting by adding to your query:

&hl=true&hl.fl=publisherText-ac

      



For each hit, the highlight component returns a return value, including tag highlighting (default <em>

). This is even more useful if the autocomplete field is received by multiple input fields and you don't want to search the results to see which field contains the corresponding value. However, the resulting list may contain duplicates.

I use both approaches, (A) for autocomplete on single fields, (B) when looking for autocomplete from multiple fields. I tried to get rid of the tags <em>

included in the selection results, but it turned out to be completely impossible (you can only change them, not completely remove them).

(using SOLR 4.0 here)

+1


source


You can just use a parameter facet.prefix=new

and let solr filter these records for you. What I would also like to consider is not creating ngrams here. Creating the face and using it facet.prefix

already does the trick. Hopefully you don't have too many unique conditions and the performance is excellent.

+1


source







All Articles