Drain token filter in Elasticsearch

Question

Drain token filter in Elasticsearch

I am trying to index some tags after they are closed and other filters applied. These tags can be multiple words.

What I am failing to do is apply the final token filter, which takes one token out of the token stream.

So, I would like the tags to be multiple words that needed to be stopped, the stop words removed, but then concatenated again in the same token before being stored in the index (for example, what does the tokenizer keyword do, but as a filter).

I don't see a way to do this if token filter methods are applied in Elasticsearch: if I market on white spaces and then stalk, all subsequent token filters will get those single tokens, not the entire token stream, right?

For example, I need a tag

fox jumping over the fence

to be stored in the index as a shared token as

fox jumping over the fence

but not

fox, jump, over the fence

Is there a way to do this without first processing the string in my application and then indexing it as not_analyzed fields?

+3

merge tokenize concatenation elasticsearch

Francesco 04 Jul 15 at 13:19

source to share

1 answer

Francesco · Accepted Answer · 2015-07-31T16:34:03+0000

After doing a little research, I found this thread:

http://elasticsearch-users.115913.n3.nabble.com/Is-there-a-concatenation-filter-td3711094.html

which had the exact solution I was looking for.

I created a simple Elasticsearch plugin that only provides the Concatenate Token Filter, which you can find at:

https://github.com/francesconero/elasticsearch-concatenate-token-filter

Drain token filter in Elasticsearch

More articles: