Create a list of custom stop words in elastic search using java

To improve search results from elasticity search, I want to augment my stopword library from my java code. So far I'm using the default breakpoint parser list, which doesn't have question words in the list like What, Who, Why, etc. We want to remove these words and some additional words from our search when requesting a result. I tried the code here (last ans) tried

PUT /my_index
{
"settings": {
"analysis": {
  "analyzer": {
    "my_analyzer": { 
      "type": "standard", 
      "stopwords": [ "and", "the" ] 
    }
  }
}

      

}}

This code in java. But that didn't work for me. Important request

How to create our own stopword list and how to implement it in our query code

QueryStringQueryBuilder qb=new QueryStringQueryBuilder(text).analyzer("stop");
            qb.field("question_title");
            qb.field("level");
            qb.field("category");
            qb.field("question_tags");
            SearchResponse response = client.prepareSearch("questionindex")
            .setSearchType(SearchType.QUERY_AND_FETCH)
            .setQuery(qb)
            .execute()
            .actionGet();
            SearchHit[] results = response.getHits().getHits();
            System.out.println("respose-"+results.length);

      

I am currently using the default stop analyzer. That just stop limited stop words like

"a", "an", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into "," is "," it "," no "," not "," from "," on "," or "," such "," this "," they "," them "," then ", "there", "these", "they", "this", "to", "was", "will", "with"

But I want to increase this library.

+3


source to share


1 answer


You are on the right track. In your first listing ( from the documentation on stopwatches ), you created a custom parser called my_analyzer

for a named index my_index

, which will have the effect of removing "and" and "the" from the text you are using my_analyzer

with.

Now, to actually use it, you must:

  • Make sure you include my_analyzer

    ( questionindex

    ?)
  • Create a mapping for your documents, which is used my_analyzer

    for fields where you want to remove "and" and "the" (for example, a field question_title

    ):
  • Test the analyzer with the analysis API

    GET /questionindex/_analyze?field=question.question_title&text=No quick brown fox jumps over my lazy dog and the indolent cat

  • Redefine documents




Try this as a starting point:

POST /questionindex
{
    "settings" : {
        "analysis": {
            "analyzer": {
                "my_analyzer": { 
                    "type": "standard", 
                    "stopwords": [ "and", "the" ] 
                }
            }
        }
    },
    "mappings" : {
        "question" : {
            "properties" : {
                "question_title" : { 
                    "type" : "string", 
                    "analyzer" : "my_analyzer" 
                },
                "level" : { 
                    "type" : "integer" 
                },
                "category" : { 
                    "type" : "string" 
                },
                "question_tags" : { 
                    "type" : "string" 
                }
            }
        }
    }
}

      

+1


source







All Articles