Ignore leading zeros with Elasticsearch

Question

Ignore leading zeros with Elasticsearch

I am trying to create a search bar where the most common query would be for "serviceOrderNo". "serviceOrderNo" is not a numeric field in the database, it is a string field. Examples:

The most common format is simply an integer number with a number of zeros.

How do I configure Elasticsearch so that the search for "65" matches "000000065"? I also want to give preference to the "serviceOrderNo" field (which I already have). This is where I am now:

{
   "query": {
      "multi_match": {
         "query": "65",
         "fields": ["serviceOrderNo^2", "_all"],
      }
   }
}

+3

full-text-search elasticsearch

Josh graham 04 June 15 at 17:27

source to share

2 answers

One way to do this is to use the ngram token filter so that "12345" gets the token:

[ 1, 2, 3, 4, 5 ]
[ 12, 23, 34, 45 ]
[ 123, 234, 345 ]
[ 12345 ]

When this token is designated like this, "65" corresponds to "000000065".

To fix this, create a new index that has its own parser that uses the ngram filter:

POST /my-index
{
   "mappings": {
      "serviceorderdto": {
         "properties": {
            "serviceOrderNo": {
               "type": "string",
               "analyzer": "autocomplete"
            }
         }
      }
   },
   "settings": {
      "analysis": {
         "filter": {
            "autocomplete_filter": {
               "type": "ngram",
               "min_gram": 1,
               "max_gram": 20
            }
         },
         "analyzer": {
            "autocomplete": {
               "type": "custom",
               "tokenizer": "standard",
               "filter": [
                  "lowercase",
                  "autocomplete_filter"
               ]
            }
         }
      }
   }
}

Please provide some details. Then run your query:

GET /my-index/_search
{
    "query": {
        "multi_match": {
            "query": "55", 
            "fields": [
               "serviceOrderNo^2",
               "_all"
            ]
        }
    }
}

0

Josh graham 05 june 15 at 13:34

source to share

IanGabes · Accepted Answer · 2015-06-04T20:16:26+0000

One way to do this is to use the regular existential lucene query:

https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-regexp-query.html

"query": {
     "regexp":{
        "serviceOrderNo": "[0]*65"
     }
}

In addition, the Query String also supports a small set of special characters, a more limited regex character set, AS WELL regex AS lucene which would look like this: https://www.elastic.co/guide/en/elasticsearch/reference /1.x/query-dsl-query-string-query.html

"query": {
    "query_string": {
       "default_field": "serviceOrderNo",
       "query": "0*65"
    }
}

These are fairly simple regular expressions that match the character (s) contained in parentheses [0]

or the character 0

indefinitely *

.

If you have the option to re-index or not yet index your data, you also have the option to make it easier for yourself by writing your own analyzer. Right now, you are using the default parser for strings in the serviceOrderNo field. When you index "serviceOrderNo":"00000065"

, ES interprets it simply as 00000065.

Your custom parser can spoof this int field with both "0000065" and "65" using the same regex. The advantage of this is that the Regex only runs once during the index, not every time you run your query, because ES will look for both "0000065" and "65".

You can also check the ES website documentation on parsers .

"settings":{
    "analysis": {
        "filter":{
           "trimZero": {
                "type":"pattern_capture",
                "patterns":"^0*([0-9]*$)"
           }
        },
       "analyzer": {
           "serviceOrderNo":{
               "type":"custom",
               "tokenizer":"standard",
               "filter":"trimZero"
           }
        }
    }
},
"mappings":{
    "serviceorderdto": {
        "properties":{
            "serviceOrderNo":{
                "type":"String",
                "analyzer":"serviceOrderNo"
            }
        }
    }
}

Ignore leading zeros with Elasticsearch

More articles: