Wildcard search with spaces

I have the following request. I am trying to find the "hello world" values, but it returns null results. However, when value = 'hello*'

, it gives me the expected result. Any idea how I can modify my query to give me this hello world result? I tried *hello world*

, but for some reason it just won't search for anything with spaces.

I think it has something to do with spaces, since when I try to do a search "* *"

it doesn't give me any results. But I know that I have a variety of values ​​with spaces. Any ideas will help!

 {
  "query": {
    "filtered": {
      "filter": {
        "and": [
          {
            "terms": {
              "variant": [
                "collection"
              ]
            }
          }
        ]
      },
      "query": {
        "wildcard": {
          "name": {
            "value": "hello world"
          }
        }
      }
    }
  }
}

      

+3


source to share


2 answers


What mapping did you use for your field name

? If you haven't defined any collation, or you just defined the type as a string (without any parser), then the field will be parsed using a standard parser. This will create tokens as "hello" and "world" separately. This means that the substitution query will work for something like *ell*

or *wor*

, but not with spaces.

You have to change your collation so that the "name" field is not_analyzed, then wildcard matching will work.



Warning: Searching for wildcards is hard. If you want to do a partial match search (equivalent to% like%), you can use the ngram token filter in your parser and search by term. It will take care of the partial string match and have better performance too.

+4


source


The "string" type is deprecated and with the "not_analyzed" index it maps to the "keyword" type, which is not divisible into substrings. I had issues with queries including spaces before they were resolved and split the query into substrings in white spaces and did a combined query adding a lookup object for each substring using "bool" and "must":

{
  "query": {
    "bool": {
      "must": [
        {
          "wildcard": {
            "name": "*hello*"
          }
        },
        {
          "wildcard": {
            "name": "*world*"
          }
        }
      ]
    }
  }
}

      

This method has a slight disadvantage that "hellish world!" and other unexpected lines end up with a result. You can solve this by changing the "wildcard" to "match" for all but the last substring.



You should try to fix the problem by first changing the field type:

PUT your_index
{
  "mappings": {
    "your_index": {
      "properties": {
        "your_field1": {
           "type": "keyword"
            },
        "your_field2": {
            "type": "string",
            "index": "not_analyzed"
            }
         }
      }
    }
  }
}

      

+1


source







All Articles