PostgreSQL GIN index slower than GIST for pg_trgm?

Question

PostgreSQL GIN index slower than GIST for pg_trgm?

Despite what all the documentation says, I find GIN indexes significantly slower than GIST indexes for pg_trgm related searches. It sits on a 25 million line table with a relatively short text box (average length is 21 characters). Most of the lines of text are addresses of the form "123 Main st, City".

The GIST index takes about 4 seconds with a lookup like

select suggestion from search_suggestions where suggestion % 'seattle';

But GIN takes 90 seconds and when working with EXPLAIN ANALYZE

:

Bitmap Heap Scan on search_suggestions  (cost=330.09..73514.15 rows=25043 width=22) (actual time=671.606..86318.553 rows=40482 loops=1)
  Recheck Cond: ((suggestion)::text % 'seattle'::text)
  Rows Removed by Index Recheck: 23214341
  Heap Blocks: exact=7625 lossy=223807
  ->  Bitmap Index Scan on tri_suggestions_idx  (cost=0.00..323.83 rows=25043 width=0) (actual time=669.841..669.841 rows=1358175 loops=1)
        Index Cond: ((suggestion)::text % 'seattle'::text)
Planning time: 1.420 ms
Execution time: 86327.246 ms

Note that over a million rows are fetched by the index, although only 40k rows actually match. Any ideas why this works so badly? This is on PostgreSQL 9.4.

+2

sql pattern-matching indexing postgresql postgresql-performance

Doug 24 Mar 17 at 20:14

source to share

1 answer

Erwin Brandstetter · Accepted Answer · 2017-06-30T19:00:50+0000

Some problems stand out:

Let's look at upgrading to the current Postgres version first . At the time of writing, this is pg 9.6 or pg 10 (currently in beta). Since Pg 9.4 there have been many improvements for GIN indexes, the pg_trgm plug-in, and big data in general.

Next, you need a lot more RAM , in particular a higher one . I can tell from this line in the output : work_mem

EXPLAIN

Heap Blocks: exact=7625 lossy=223807

"lossy" in detail for scanning a bunch of bitmap (with your specific numbers) indicates a dramatic shortage work_mem

. Postgres only collects block addresses in a bitmap index scan instead of line pointers, as it is expected to be faster with your low value work_mem

(cannot hold exact addresses in RAM). Many other unqualified strings must be filtered out in the next Bitmap memory scan this way. This linked answer has details:

"Recheck Cond:" in RIP Scan Query Plans

But don't set it work_mem

too high without considering the whole situation:

Optimize a simple query using date and ORDER BY text

There could be other problems like indexing or bloating on tables or narrower configuration bottlenecks. But if you only fix these two items, the query should be much faster.

Also, do you really need to extract all 40k lines in the example? You probably want to add a small one to the query and make it the "nearest neighbor" of the search - in which case the GiST index is the best choice after all, because it should be faster with the GiST index. Example: LIMIT

Best index for a similarity function

PostgreSQL GIN index slower than GIST for pg_trgm?

More articles: