Using django haystack autocomplete with elasticsearch to find numbers / digits?
I am using Django Haystack supported by Elasticsearch for autocomplete and I am having trouble finding numbers in a field.
For example, I have a field called 'name' for an object type that has the following values:
['NAME', 'NAME2', 'NAME7', 'ANOTHER NAME 8', '7342', 'SOMETHING ELSE', 'LAST ONE 7']
and I would like to use autocomplete to find all objects with the number "7" in the name.
I set my search_index with this field:
name_auto = indexes.EdgeNgramField(model_attr='name')
and I am using a query like this:
SearchQuerySet().autocomplete(name_auto='7')
However, this search returns no results. I believe this is because the edge-ngram tokenizer for elasticsearch is set to "lower case" by default, which excludes digits entirely.
So I found elasticstack that allows you to configure the haystack / elasticsearch backend, but I cannot configure ELASTICSEARCH_INDEX_SETTINGS correctly to get the functionality I need.
The default settings are as follows:
ELASTICSEARCH_INDEX_SETTINGS = {
'settings': {
"analysis": {
"analyzer": {
"synonym_analyzer" : {
"type": "custom",
"tokenizer" : "standard",
"filter" : ["synonym"]
},
"ngram_analyzer": {
"type": "custom",
"tokenizer": "lowercase",
"filter": ["haystack_ngram", "synonym"]
},
"edgengram_analyzer": {
"type": "custom",
"tokenizer": "lowercase",
"filter": ["haystack_edgengram"]
}
},
"tokenizer": {
"haystack_ngram_tokenizer": {
"type": "nGram",
"min_gram": 3,
"max_gram": 15,
},
"haystack_edgengram_tokenizer": {
"type": "edgeNGram",
"min_gram": 2,
"max_gram": 15,
"side": "front"
}
},
"filter": {
"haystack_ngram": {
"type": "nGram",
"min_gram": 3,
"max_gram": 15
},
"haystack_edgengram": {
"type": "edgeNGram",
"min_gram": 2,
"max_gram": 15
},
"synonym" : {
"type" : "synonym",
"ignore_case": "true",
"synonyms_path" : "synonyms.txt"
}
}
}
}
}
I tried to change the edgengram_analyzer block in multiple ways without success and add something like this
"token_chars": [ "letter", "digit" ]
in "haystack_ngram_tokenizer" doesn't work either.
Can someone help me figure out how to use haystack / elasticsearch / autocomplete to find numbers? Or do I need to split the "name" field into all possible n-grams and then use a standard match search? Any help would be greatly appreciated.
Thank you so much!
source to share
There is a solution that helps me: http://silentsokolov.github.io/2014/09/03/django-haystack-elasticsearch-prombiemy-avtodopolnieniia.html
The document is written in Russian, so use Google Translation.
source to share