Postgresql trigram text search with pg_search result order

I have implemented trigram search using pg_search gem on rails. https://github.com/Casecommons/pg_search

The problem is that sometimes the order of the returned results seems to be wrong according to the definition of trigram search, which shows the gem documentation:

Trigram search works by counting the number of three-letter substrings (or "trigrams") between the query and the text.

My application receives string input from the user ("111 Streetname") and returns a list of addresses matching the Address.full_string value, with a rough search using a trigram.

List of search examples

Search for trigrams: "1493 cambrid"

  • Results:
    • 100 Cambridgeside Pl
    • 100 Cambridgeside Pl
    • 150 Cambridgepark Dr
    • 1575 Cambridge St
    • 1573 Cambridge St
    • 1493 Cambridge St

Search for trigrams: "1493 cambr"

  • Result:
    • 1493 Cambridge St

Search for trigrams: "1493 cambri"

  • Results:
    • 1575 Cambridge St
    • 1573 Cambridge St
    • 1493 Cambridge St

Search for trigrams: "1493 cambridge"

  • Results:
    • 1493 Cambridge St
    • 5 Cambridgepark Dr
    • 7 Cambridgepark Dr
    • 100 Cambridgeside Pl
    • and much more

Question

¿Why isn't "1493 Cambridge St" always on top of the results? ¿Do I need to modify the trigram search query or is this just the way the algorithm works?

Request example

SELECT "addresses".*, (ts_rank((to_tsvector('simple', coalesce("addresses"."full_string"::text, ''))), (to_tsquery('simple', ''' ' || '1493' || ' ''') && to_tsquery('simple', ''' ' || 'cambridge' || ' ''')), 0)) AS pg_search_rank FROM "addresses" WHERE (((coalesce("addresses"."full_string"::text, '')) % '1493 cambridge')) ORDER BY pg_search_rank DESC, "addresses"."id" ASC

      

+3


source to share


1 answer


While you're in the tutorial on finding trigrams , you're actually working with a function ts_rank()

from text search .

If you order results

(addresses.full_string <-> '1493 cambridge')

      



... you get what you ask for.
<->

is the operator of the trigram distance.

You can also use the %

("similarity") operator in a sentence WHERE

. Ideally, you would have a GiST index with an gist_trgm_ops

in column for this.

+3


source







All Articles