Why isn't my master wildcard search working in Solr?
I have a textbox defining that I am using copyField to populate with various source fields, and the purpose for that one field is what I use to find the Solr index.
This text box uses this custom fieldType "text_en_splitting_reversed". I created this field type by copying the "text_en_splitting" example and adding the ReversedWildcardFilterFactory to the index parser.
<!-- Just like text_en_splitting, but with the addition of reversed tokens for leading wildcard matches -->
<fieldType name="text_en_splitting_reversed" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<!-- in this example, we will only use synonyms at query time
<filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
-->
<!-- Case insensitive stop word removal.
add enablePositionIncrements=true in both the index and query
analyzers to leave a 'gap' for more accurate phrase queries.
-->
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="lang/stopwords_en.txt"
enablePositionIncrements="true"
/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" types="word-delim-types.txt" />
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
<filter class="solr.PorterStemFilterFactory"/>
<filter class="solr.ReversedWildcardFilterFactory" withOriginal="true"
maxPosAsterisk="3" maxPosQuestion="2" maxFractionAsterisk="0.33"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="lang/stopwords_en.txt"
enablePositionIncrements="true"
/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1" types="word-delim-types.txt" />
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
</fieldType>
My main problem: I am getting unexpected results when searching using the main template. For example, I know that one specific search I do for "* car" should return one match (the document contains the word "race car"). Since this was unfortunate, I decided to debug it in the analyzer tool in Solr Admin. Here is a screenshot of my test:
I'm new to this parser tool, but shouldn't the right side hold the leading sprocket all the way? And why doesn't it end? Should I reverse-process user-entered keywords?
Now, in my index query configuration, I am configured to use edismax. However, in the admin gui parser, I don't see a way to control whether it uses a standard parser or edismax. (Maybe it doesn't matter?)
In case this information can help provide more context, I'm going to surpass my goals for indexing this particular field:
- I would like the car to match the racing car. This does not work.
- I would like $ 30 to match documents containing $ 30 but not $ 30 (no dollar sign). So I added the types = "" attribute where I define $ as DIGIT. This one works .
- I would like 30 to match documents containing $ 30. This does not work.
source to share
Ultimately the main issue with wildcards was a bug in our search engine interface. We have a code that wraps all keywords or phrases in quotation marks before the request is sent to Solr. This way, if a phrase was entered, it would be surrounded by quotes and work fine. And it doesn't affect regular keyword searches.
But apparently if it is a wildcard search by putting quotes around it the search fails for some reason. When I remove the quotes, * the car matched the posts that the race car was in as hoped.
As for my secondary problem (why "30" doesn't match documents containing "$ 30"), I also solved this problem in a separate StackOverflow thread: How do I find documents containing numbers and dollar signs in Solr?
As an aside, I think there is a bug in the Solr admin gui parsing. When testing wildcard lookups, I can never get any highlight indicating that a match would have been made ... this added further to my confusion trying to debug the problem.
source to share
You can see from your screenshot that the WordDelimiterFilterFactory has removed your presenter *. Try adding preserveOriginal="1"
a query parser to your side.
<filter class="solr.WordDelimiterFilterFactory"
preserveOriginal="1"
generateWordParts="1"
generateNumberParts="1"
catenateWords="0"
catenateNumbers="0"
catenateAll="0"
splitOnCaseChange="1"
types="word-delim-types.txt" />
source to share