I am setting up a tokenizer that delimits words under the underscore char as well as all other punctuation characters. I decided to use word_delimiter . Then I set my default parser for the desired field.

I have two problems:

  • The parser splits lines into words, but does not preserve the original line, despite the preserve_original parameter. See Query Analysis.
  • Searching on underscore-delimited substrings still produces no results

Here is my template, data object, parser test and search queries:

PUT simple
  "template" : "simple",
  "settings" : {
    "index" : {
      "analysis" : {
          "analyzer" : {
              "underscore_splits_words" : {
                  "tokenizer" : "standard",
                  "filter" : ["word_delimiter"],
                  "generate_word_parts" : true,
                  "preserve_original" : true
    "mappings": {
        "_default_": {
             "properties" : {
                "request" : { "type" : "string", "analyzer" : "underscore_splits_words" }


Data object:

POST simple/0 
{ "request" : "GET /queue/1/under_score-hyphenword/poll?ttl=300&limit=10" }


This returns tokens: "under", "score", "hyphenword", but not "underscore_splits_words":

POST simple/_analyze?analyzer=underscore_splits_words


searching results


GET simple/_search?q=hyphenword



POST simple/_search
"query": {
        "query_string": {
          "query": "hyphenword"



GET simple/_search?q=score



POST simple/_search
"query": {
        "query_string": {
          "query": "score"


Please suggest the correct way to achieve my goal. Thank!


You should be able to use a "simple" parser for this. There is no need for a custom parser because a simple parser uses a literal tokenizer and a lowercase tokenizer in combination (so any non-leading characters signal a new token). The reason you are not getting any hits is because you are not specifying the field in your request, so you are asking for the _all field, which is mainly for convenient full text search.

Create Index

PUT myindex
    "mappings":     {
        "mytype": {
            "properties": {
                "request": {
                    "type": "string",
                    "analyzer": "simple"


Insert document

POST myindex/mytype/1 
{ "request" : "GET /queue/1/key_word-hyphenword/poll?ttl=300&limit=10" }


Request for document

GET myindex/mytype/_search?q=request:key


Request using DSL request:

POST myindex/mytype/_search
     "query": {
         "query_string": {
             "default_field": "request", 
             "query": "key"


Another request using DSL request:

POST myindex/mytype/_search
    "query": {
        "bool": {
            "must": [
                { "match": { "request": "key"}}


The query result looks correct:

   "took": 1,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   "hits": {
      "total": 1,
      "max_score": 0.095891505,
      "hits": [
            "_index": "myindex",
            "_type": "mytype",
            "_id": "1",
            "_score": 0.095891505,
            "_source": {
               "request": "GET /queue/1/key_word-hyphenword/poll?ttl=300&limit=10"


If you want to omit a specific field you are looking for (DO NOT RECOMMEND), you can set the default parser for all collations in the index when you create the index. (Note: This feature is deprecated and you shouldn't use it for performance / stability reasons.)

Create an index with default collation to parse the _all field with a "simple" parser

PUT myindex
    "mappings":     {
        "_default_": {
            "index_analyzer": "simple"


Insert document

POST myindex/mytype/1 
{ "request" : "GET /queue/1/key_word-hyphenword/poll?ttl=300&limit=10" }


Request an index without specifying a field

GET myindex/mytype/_search?q=key


You will get the same result (1 hit).



