How to request a phrase with stop words in ElasticSearch
I am indexing some text with stop words included, and I would like to search for them using a "match phrase" query without slop, but it looks like the temp words still respect position positions.
Building index:
PUT /fr_articles
{
"settings": {
"analysis": {
"analyzer": {
"stop": {
"type": "standard",
"stopwords" : ["the"]
}
}
}
},
"mappings": {
"test": {
"properties": {
"title": {
"type": "string",
"analyzer": "stop"
}
}
}
}
}
Add document:
POST /fr_articles/test/1
{
"title" : "Tom the king of Toulon!"
}
Search:
POST /fr_articles/_search
{
"fields": [
"title"
],
"explain": true,
"query": {
"match": {
"title": {
"query": "tom king",
"type" : "phrase"
}
}
}
}
Nothing found; - (
Is there a way to fix this? Or maybe with multiple range queries, but I want the term to be next to each other.
Thank,
source to share
Position increment causes this problem, yes. While the stop word might disappear and won't be searchable, it still doesn't drag two words next to each other, so the query "tom the king"
finds neither "tom king"
nor "such that tom will not be their king"
.
Often times, when you remove something in a filtered analysis, it's not quite as if it never happened. The goal StopFilter
, in particular, is to remove search terms resulting from uninteresting terms. This should not change the structure of the document or sentence.
You had the option to turn off position increments by StopFilter
, but this option has been removed since Lucene 4.4.
Ok, forget about CharFilter being fooled. Ugly hack, don't do this.
To query without using position increments, you need to configure this in your query parser, not in your analysis. This can be done in elasticsearch with the Query String the Query , with enable_position_increments
set to false.
Something like:
{
"query_string" : {
"default_field" : "title",
"query" : "\"tom king\""
"enable_position_increments" : false
}
}
As a point of interest, a similar solution in raw Lucene by installing QueryParser.setEnablePositionIncrements
.
source to share
There was an option enable_position_increments: false
that you could set, for example. in the stop filter, but it is deprecated since Lucene 4.4
This is a related Lucene issue: https://issues.apache.org/jira/browse/LUCENE-4065
In other words, the best way to go for now is probably using the slop option until the Lucene issue is fixed.
source to share