Search 300 million addresses with pg_trgm

I have 300 million addresses in my PostgreSQL 9.3 DB and I want to use pg_trgm to fuzzy search strings. The ultimate goal is to implement a search function similar to Google Maps search.

When I used pg_trgm to find these addresses, it took about 30 seconds to get the results. There are many lines that follow the default 0.3 standard, but I only need 5 or 10 results. I created a GiST trigram index:

CREATE INDEX addresses_trgm_index ON addresses USING gist (address gist_trgm_ops);

      

This is my request:

SELECT address, similarity(address, '981 maun st') AS sml 
FROM addresses 
WHERE address % '981 maun st' 
ORDER BY sml DESC 
LIMIT 10;

      

The production environment test table has been deleted. I am displaying output EXPLAIN

from my test environment. There are about 7 million rows and it takes about 1.6 to get the results. With 300 million, it takes more than 30 seconds.

ebdb=> explain analyse select address, similarity(address, '781 maun st') as sml from addresses where address % '781 maun st' order by sml desc limit 10;
                                    QUERY PLAN                                                                            
β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”    
 Limit  (cost=7615.83..7615.86 rows=10 width=16) (actual time=1661.004..1661.010 rows=10 loops=1)
 ->  Sort  (cost=7615.83..7634.00 rows=7268 width=16) (actual time=1661.003..1661.005 rows=10 loops=1)
     Sort Key: (similarity((address)::text, '781 maun st'::text))
     Sort Method: top-N heapsort  Memory: 25kB
     ->  Index Scan using addresses_trgm_index on addresses  (cost=0.41..7458.78 rows=7268 width=16) (actual time=0.659..1656.386 rows=5241 loops=1)
           Index Cond: ((address)::text % '781 maun st'::text)
 Total runtime: 1661.066 ms
(7 rows)

      

Is there a good way to improve performance or is this a good plan for partitioning tables?

+3


source to share


1 answer


PostgreSQL 9.3 ... Is there a good way to improve performance or is this a good plan for partitioning tables?

Splitting tables won't help at all.

But yes, there is a good way: Upgrade to the current Postgres version. Many improvements have been made to GiST indexes, pg_trgm in particular, and big data in general. Should be significantly faster with Postgres 9.6 or the upcoming Postgres 10 (currently in beta).

Nearest Neighbor looks correct, but for a small LIMIT

one, use this equivalent query instead:



SELECT address, similarity(address, '981 maun st') AS sml 
FROM   addresses 
WHERE  address % '981 maun st' 
ORDER  BY address &lt-> '981 maun st'
LIMIT  10;
      

Quote from the manual:

Usually will beat the first wording when only a small number of you need the closest matches.

+1


source







All Articles