PostgreSQL GIN index slower than GIST for pg_trgm?

Despite what all the documentation says, I find GIN indexes significantly slower than GIST indexes for pg_trgm related searches. It sits on a 25 million line table with a relatively short text box (average length is 21 characters). Most of the lines of text are addresses of the form "123 Main st, City".

The GIST index takes about 4 seconds with a lookup like

select suggestion from search_suggestions where suggestion % 'seattle';

      

But GIN takes 90 seconds and when working with EXPLAIN ANALYZE

:

Bitmap Heap Scan on search_suggestions  (cost=330.09..73514.15 rows=25043 width=22) (actual time=671.606..86318.553 rows=40482 loops=1)
  Recheck Cond: ((suggestion)::text % 'seattle'::text)
  Rows Removed by Index Recheck: 23214341
  Heap Blocks: exact=7625 lossy=223807
  ->  Bitmap Index Scan on tri_suggestions_idx  (cost=0.00..323.83 rows=25043 width=0) (actual time=669.841..669.841 rows=1358175 loops=1)
        Index Cond: ((suggestion)::text % 'seattle'::text)
Planning time: 1.420 ms
Execution time: 86327.246 ms

      

Note that over a million rows are fetched by the index, although only 40k rows actually match. Any ideas why this works so badly? This is on PostgreSQL 9.4.

+2


source to share


1 answer


Some problems stand out:

Let's look at upgrading to the current Postgres version first . At the time of writing, this is pg 9.6 or pg 10 (currently in beta). Since Pg 9.4 there have been many improvements for GIN indexes, the pg_trgm plug-in, and big data in general.

Next, you need a lot more RAM , in particular a higher one . I can tell from this line in the output : work_mem

EXPLAIN

Heap Blocks: exact=7625 lossy=223807

      

"lossy" in detail for scanning a bunch of bitmap (with your specific numbers) indicates a dramatic shortage work_mem

. Postgres only collects block addresses in a bitmap index scan instead of line pointers, as it is expected to be faster with your low value work_mem

(cannot hold exact addresses in RAM). Many other unqualified strings must be filtered out in the next Bitmap memory scan this way. This linked answer has details:



But don't set it work_mem

too high without considering the whole situation:

There could be other problems like indexing or bloating on tables or narrower configuration bottlenecks. But if you only fix these two items, the query should be much faster.

Also, do you really need to extract all 40k lines in the example? You probably want to add a small one to the query and make it the "nearest neighbor" of the search - in which case the GiST index is the best choice after all, because it should be faster with the GiST index. Example: LIMIT

0


source







All Articles