Solr Recommendester: Duplicate Search (solrcloud)

I have two shards and I am trying to follow the recommendation (using solr 4.10.1) using distributed shard search. The helper seems to go through each of the shards and attach to the result set, leaving duplicates. In my solrconfig.xml file I have the following:

<searchComponent name="suggest" class="solr.SuggestComponent">
    <lst name="suggester">
      <str name="name">titleSuggester</str>
      <str name="lookupimpl">AnalyzingLookupFactory</str>
      <str name="lookupimpl">FreeTextSuggesterFactory</str>
      <str name="dictionaryimpl">DocumentDictionaryFactory</str>
      <str name="field">title_sug</str>
      <str name="weightField">rank</str>
      <str name="suggestAnalyzerFieldType">shingleSuggest</str>
      <str name="buildOnCommit">true</str>
    </lst>
</searchComponent>`


<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy">
  <lst name="defaults">
    <str name="suggest">true</str>
    <str name="suggest.count">10</str>
  </lst>
  <arr name="components">
    <str>suggest</str>
  </arr>
</requestHandler>

      

http://localhost:8983/solr/collection1/suggest?suggest.dictionary=titleSuggester&shards.qt=/suggest&shards=shard1,shard2&suggest.q=an&wt=json&indent=true

leads to:

{   "responseHeader":{
    "status":0,
    "QTime":12},   "suggest":{"titleSuggester":{
      "an":{
        "numFound":10,
        "suggestions":[{
            "term":"an",
            "weight":149,
            "payload":""},
          {
            "term":"an",
            "weight":142,
            "payload":""},
          {
            "term":"an american",
            "weight":6,
            "payload":""},
          {
            "term":"an affair",
            "weight":4,
            "payload":""},
          {
            "term":"an 18th century",
            "weight":2,
            "payload":""},
          {
            "term":"an 18th",
            "weight":2,
            "payload":""},
          {
            "term":"an american hymn",
            "weight":2,
            "payload":""},
          {
            "term":"an 18th century drawing room",
            "weight":2,
            "payload":""},
          {
            "term":"an 18th century drawing",
            "weight":2,
            "payload":""},
          {
            "term":"an american hymn (main",
            "weight":2,
            "payload":""}]}}}}

      

As you can see above, the result "an" is returned twice, one from each shard. If I make the same request with distribution = false ( http://localhost:8983/solr/collection1/suggest?suggest.dictionary=titleSuggester&distrib=false&suggest.q=an&wt=json&indent=true

), I only get no duplicates, as I would expect:

{ "responseHeader":{
    "status":0,
    "QTime":1},
  "suggest":{"titleSuggester":{
      "an":{
        "numFound":10,
        "suggestions":[{
            "term":"an",
            "weight":149,
            "payload":""},
          {
            "term":"an 18th",
            "weight":2,
            "payload":""},
          {
            "term":"an 18th century",
            "weight":2,
            "payload":""},
          {
            "term":"an 18th century drawing",
            "weight":2,
            "payload":""},
          {
            "term":"an 18th century drawing room",
            "weight":2,
            "payload":""},
          {
            "term":"an absolution take",
            "weight":1,
            "payload":""},
          {
            "term":"an absolution take her",
            "weight":1,
            "payload":""},
          {
            "term":"an absolution take her to",
            "weight":1,
            "payload":""},
          {
            "term":"an absolution take her to sea,",
            "weight":1,
            "payload":""},
          {
            "term":"an affair",
            "weight":4,
            "payload":""}]}}}}

      

Is there a way to remove duplicate results?

+3


source to share


1 answer


You can use the Solr group function; add to your request:

& group = true & group.field = Term & group.main = true



This will only return one document for the same term and will return them in the same format as a normal query (group.main = true).

See http://wiki.apache.org/solr/FieldCollapsing for details .

0


source







All Articles