Elasticsearch: separating words by underscore; Nothing found

Question

Elasticsearch: separating words by underscore; Nothing found

I am setting up a tokenizer that delimits words under the underscore char as well as all other punctuation characters. I decided to use word_delimiter . Then I set my default parser for the desired field.

I have two problems:

The parser splits lines into words, but does not preserve the original line, despite the preserve_original parameter. See Query Analysis.
Searching on underscore-delimited substrings still produces no results

Here is my template, data object, parser test and search queries:

PUT simple
{
  "template" : "simple",
  "settings" : {
    "index" : {
      "analysis" : {
          "analyzer" : {
              "underscore_splits_words" : {
                  "tokenizer" : "standard",
                  "filter" : ["word_delimiter"],
                  "generate_word_parts" : true,
                  "preserve_original" : true
              }
          }
      }
    },
    "mappings": {
        "_default_": {
             "properties" : {
                "request" : { "type" : "string", "analyzer" : "underscore_splits_words" }
            }
        }
    }
  }
}

Data object:

POST simple/0 
{ "request" : "GET /queue/1/under_score-hyphenword/poll?ttl=300&limit=10" }

This returns tokens: "under", "score", "hyphenword", but not "underscore_splits_words":

POST simple/_analyze?analyzer=underscore_splits_words
{"/queue/1/under_score-hyphenword/poll?ttl=300&limit=10"}

searching results

Hit:

GET simple/_search?q=hyphenword

Hit:

POST simple/_search
{ 
"query": {
        "query_string": {
          "query": "hyphenword"
        }
      }
}

Miss:

GET simple/_search?q=score

Miss:

POST simple/_search
{ 
"query": {
        "query_string": {
          "query": "score"
        }
      }
}

Please suggest the correct way to achieve my goal. Thank!

+3

elasticsearch

Volodymyr linevych 05 Aug 15 at 16:06

source to share

1 answer

Jona · Accepted Answer · 2015-08-05T18:19:54+0000

You should be able to use a "simple" parser for this. There is no need for a custom parser because a simple parser uses a literal tokenizer and a lowercase tokenizer in combination (so any non-leading characters signal a new token). The reason you are not getting any hits is because you are not specifying the field in your request, so you are asking for the _all field, which is mainly for convenient full text search.

Create Index

PUT myindex
{
    "mappings":     {
        "mytype": {
            "properties": {
                "request": {
                    "type": "string",
                    "analyzer": "simple"
                }
            }
        }
    }
}

Insert document

POST myindex/mytype/1 
{ "request" : "GET /queue/1/key_word-hyphenword/poll?ttl=300&limit=10" }

Request for document

GET myindex/mytype/_search?q=request:key

Request using DSL request:

POST myindex/mytype/_search
 {
     "query": {
         "query_string": {
             "default_field": "request", 
             "query": "key"
         }
     }
 }

Another request using DSL request:

POST myindex/mytype/_search
{
    "query": {
        "bool": {
            "must": [
                { "match": { "request": "key"}}
            ]
        }
    }
}

The query result looks correct:

{
   "took": 1,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 0.095891505,
      "hits": [
         {
            "_index": "myindex",
            "_type": "mytype",
            "_id": "1",
            "_score": 0.095891505,
            "_source": {
               "request": "GET /queue/1/key_word-hyphenword/poll?ttl=300&limit=10"
            }
         }
      ]
   }
}

If you want to omit a specific field you are looking for (DO NOT RECOMMEND), you can set the default parser for all collations in the index when you create the index. (Note: This feature is deprecated and you shouldn't use it for performance / stability reasons.)

Create an index with default collation to parse the _all field with a "simple" parser

PUT myindex
{
    "mappings":     {
        "_default_": {
            "index_analyzer": "simple"
        }
    }
}

Insert document

POST myindex/mytype/1 
{ "request" : "GET /queue/1/key_word-hyphenword/poll?ttl=300&limit=10" }

Request an index without specifying a field

GET myindex/mytype/_search?q=key

You will get the same result (1 hit).

Elasticsearch: separating words by underscore; Nothing found

Create Index

Insert document

Request for document

Request using DSL request:

Another request using DSL request:

Create an index with default collation to parse the _all field with a "simple" parser

Insert document

Request an index without specifying a field

More articles: