Postgresql trigram text search with pg_search result order
I have implemented trigram search using pg_search gem on rails. https://github.com/Casecommons/pg_search
The problem is that sometimes the order of the returned results seems to be wrong according to the definition of trigram search, which shows the gem documentation:
Trigram search works by counting the number of three-letter substrings (or "trigrams") between the query and the text.
My application receives string input from the user ("111 Streetname") and returns a list of addresses matching the Address.full_string value, with a rough search using a trigram.
List of search examples
Search for trigrams: "1493 cambrid"
- Results:
- 100 Cambridgeside Pl
- 100 Cambridgeside Pl
- 150 Cambridgepark Dr
- 1575 Cambridge St
- 1573 Cambridge St
- 1493 Cambridge St
Search for trigrams: "1493 cambr"
- Result:
- 1493 Cambridge St
Search for trigrams: "1493 cambri"
- Results:
- 1575 Cambridge St
- 1573 Cambridge St
- 1493 Cambridge St
Search for trigrams: "1493 cambridge"
- Results:
- 1493 Cambridge St
- 5 Cambridgepark Dr
- 7 Cambridgepark Dr
- 100 Cambridgeside Pl
- and much more
Question
¿Why isn't "1493 Cambridge St" always on top of the results? ¿Do I need to modify the trigram search query or is this just the way the algorithm works?
Request example
SELECT "addresses".*, (ts_rank((to_tsvector('simple', coalesce("addresses"."full_string"::text, ''))), (to_tsquery('simple', ''' ' || '1493' || ' ''') && to_tsquery('simple', ''' ' || 'cambridge' || ' ''')), 0)) AS pg_search_rank FROM "addresses" WHERE (((coalesce("addresses"."full_string"::text, '')) % '1493 cambridge')) ORDER BY pg_search_rank DESC, "addresses"."id" ASC
source to share
While you're in the tutorial on finding trigrams , you're actually working with a function ts_rank()
from text search .
If you order results
(addresses.full_string <-> '1493 cambridge')
... you get what you ask for. <->
is the operator of the trigram distance.
You can also use the %
("similarity") operator in a sentence WHERE
. Ideally, you would have a GiST index with an gist_trgm_ops
in column for this.
source to share