Using django haystack autocomplete with elasticsearch to find numbers / digits?

I am using Django Haystack supported by Elasticsearch for autocomplete and I am having trouble finding numbers in a field.

For example, I have a field called 'name' for an object type that has the following values:

['NAME', 'NAME2', 'NAME7', 'ANOTHER NAME 8', '7342', 'SOMETHING ELSE', 'LAST ONE 7']

      

and I would like to use autocomplete to find all objects with the number "7" in the name.

I set my search_index with this field:

name_auto = indexes.EdgeNgramField(model_attr='name')

      

and I am using a query like this:

SearchQuerySet().autocomplete(name_auto='7')

      

However, this search returns no results. I believe this is because the edge-ngram tokenizer for elasticsearch is set to "lower case" by default, which excludes digits entirely.

So I found elasticstack that allows you to configure the haystack / elasticsearch backend, but I cannot configure ELASTICSEARCH_INDEX_SETTINGS correctly to get the functionality I need.

The default settings are as follows:

ELASTICSEARCH_INDEX_SETTINGS = {
    'settings': {
        "analysis": {
            "analyzer": {
                "synonym_analyzer" : {
                    "type": "custom",
                    "tokenizer" : "standard",
                    "filter" : ["synonym"]
                },
                "ngram_analyzer": {
                    "type": "custom",
                    "tokenizer": "lowercase",
                    "filter": ["haystack_ngram", "synonym"]
                },
                "edgengram_analyzer": {
                    "type": "custom",
                    "tokenizer": "lowercase",
                    "filter": ["haystack_edgengram"]
                }
            },
            "tokenizer": {
                "haystack_ngram_tokenizer": {
                    "type": "nGram",
                    "min_gram": 3,
                    "max_gram": 15,
                },
                "haystack_edgengram_tokenizer": {
                    "type": "edgeNGram",
                    "min_gram": 2,
                    "max_gram": 15,
                    "side": "front"
                }
            },
            "filter": {
                "haystack_ngram": {
                    "type": "nGram",
                    "min_gram": 3,
                    "max_gram": 15
                },
                "haystack_edgengram": {
                    "type": "edgeNGram",
                    "min_gram": 2,
                    "max_gram": 15
                },
                "synonym" : {
                    "type" : "synonym",
                    "ignore_case": "true",
                    "synonyms_path" : "synonyms.txt"
                }
            }
        }
    }
}

      

I tried to change the edgengram_analyzer block in multiple ways without success and add something like this

"token_chars": [ "letter", "digit" ]

      

in "haystack_ngram_tokenizer" doesn't work either.

Can someone help me figure out how to use haystack / elasticsearch / autocomplete to find numbers? Or do I need to split the "name" field into all possible n-grams and then use a standard match search? Any help would be greatly appreciated.

Thank you so much!

+3


source to share


1 answer


There is a solution that helps me: http://silentsokolov.github.io/2014/09/03/django-haystack-elasticsearch-prombiemy-avtodopolnieniia.html



The document is written in Russian, so use Google Translation.

-1


source







All Articles