Wildcard search with spaces
I have the following request. I am trying to find the "hello world" values, but it returns null results. However, when value = 'hello*'
, it gives me the expected result. Any idea how I can modify my query to give me this hello world result? I tried *hello world*
, but for some reason it just won't search for anything with spaces.
I think it has something to do with spaces, since when I try to do a search "* *"
it doesn't give me any results. But I know that I have a variety of values with spaces. Any ideas will help!
{
"query": {
"filtered": {
"filter": {
"and": [
{
"terms": {
"variant": [
"collection"
]
}
}
]
},
"query": {
"wildcard": {
"name": {
"value": "hello world"
}
}
}
}
}
}
source to share
What mapping did you use for your field name
? If you haven't defined any collation, or you just defined the type as a string (without any parser), then the field will be parsed using a standard parser. This will create tokens as "hello" and "world" separately. This means that the substitution query will work for something like *ell*
or *wor*
, but not with spaces.
You have to change your collation so that the "name" field is not_analyzed, then wildcard matching will work.
Warning: Searching for wildcards is hard. If you want to do a partial match search (equivalent to% like%), you can use the ngram token filter in your parser and search by term. It will take care of the partial string match and have better performance too.
source to share
The "string" type is deprecated and with the "not_analyzed" index it maps to the "keyword" type, which is not divisible into substrings. I had issues with queries including spaces before they were resolved and split the query into substrings in white spaces and did a combined query adding a lookup object for each substring using "bool" and "must":
{
"query": {
"bool": {
"must": [
{
"wildcard": {
"name": "*hello*"
}
},
{
"wildcard": {
"name": "*world*"
}
}
]
}
}
}
This method has a slight disadvantage that "hellish world!" and other unexpected lines end up with a result. You can solve this by changing the "wildcard" to "match" for all but the last substring.
You should try to fix the problem by first changing the field type:
PUT your_index
{
"mappings": {
"your_index": {
"properties": {
"your_field1": {
"type": "keyword"
},
"your_field2": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
source to share