Elasticearch fuzzy algorithms match optimization for huge server and server cluster

Question

Elasticearch fuzzy algorithms match optimization for huge server and server cluster

I have an index with some pretty complex queries running on it. The main slowdown is fuzzy queries that run against a field containing 2-5 words for each record. I basically have to find lines with 1-3 different characters.

On my 4-core (with HT) and 8GB machine, my queries are running about 1-2 from each. On a server with 12 cores (with HT) and 72Gb RAM, the request is made in 0.3-0.5 seconds. This doesn't seem like a reasonable scaling to me on hardware. I'm sure there must be some hidden settings to tune query performance.

I have looked through the guide on finding elastic images but could not find anything that could help me with performance tuning based on the number of CPUs or RAM, or tuning resistors specifically for fuzzy queries.

Another question: how much does it scale if I add such a server? will the request time be about half the time?

+3

performance elasticsearch fuzzy-search

Yervand Aghababyan 28 Mar 12 at 16:57

source to share

1 answer

imotov · Accepted Answer · 2012-03-28T17:37:39+0000

There are a couple of possibilities here. First, your request is I / O related. In this case, simply adding another server can help, since the two nodes will be pulling data from the two drives. Another possibility is that your request is CPU bound. To a large extent, a single shard search is a single threaded process. Assuming your index was built with default settings, it has 5 shards. Thus, your request cannot significantly benefit from running on more than 5 processors. In this case, adding another node will only slow things down due to network overhead. Instead, you need to recreate the index with a lot of shards.

Elasticearch fuzzy algorithms match optimization for huge server and server cluster

More articles: