Create a list of custom stop words in elastic search using java
To improve search results from elasticity search, I want to augment my stopword library from my java code. So far I'm using the default breakpoint parser list, which doesn't have question words in the list like What, Who, Why, etc. We want to remove these words and some additional words from our search when requesting a result. I tried the code here (last ans) tried
PUT /my_index
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"type": "standard",
"stopwords": [ "and", "the" ]
}
}
}
}}
This code in java. But that didn't work for me. Important request
How to create our own stopword list and how to implement it in our query code
QueryStringQueryBuilder qb=new QueryStringQueryBuilder(text).analyzer("stop");
qb.field("question_title");
qb.field("level");
qb.field("category");
qb.field("question_tags");
SearchResponse response = client.prepareSearch("questionindex")
.setSearchType(SearchType.QUERY_AND_FETCH)
.setQuery(qb)
.execute()
.actionGet();
SearchHit[] results = response.getHits().getHits();
System.out.println("respose-"+results.length);
I am currently using the default stop analyzer. That just stop limited stop words like
"a", "an", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into "," is "," it "," no "," not "," from "," on "," or "," such "," this "," they "," them "," then ", "there", "these", "they", "this", "to", "was", "will", "with"
But I want to increase this library.
source to share
You are on the right track. In your first listing ( from the documentation on stopwatches ), you created a custom parser called my_analyzer
for a named index my_index
, which will have the effect of removing "and" and "the" from the text you are using my_analyzer
with.
Now, to actually use it, you must:
- Make sure you include
my_analyzer
(questionindex
?) - Create a mapping for your documents, which is used
my_analyzer
for fields where you want to remove "and" and "the" (for example, a fieldquestion_title
): -
Test the analyzer with the analysis API
GET /questionindex/_analyze?field=question.question_title&text=No quick brown fox jumps over my lazy dog and the indolent cat
-
Redefine documents
Try this as a starting point:
POST /questionindex
{
"settings" : {
"analysis": {
"analyzer": {
"my_analyzer": {
"type": "standard",
"stopwords": [ "and", "the" ]
}
}
}
},
"mappings" : {
"question" : {
"properties" : {
"question_title" : {
"type" : "string",
"analyzer" : "my_analyzer"
},
"level" : {
"type" : "integer"
},
"category" : {
"type" : "string"
},
"question_tags" : {
"type" : "string"
}
}
}
}
}
source to share