Fuzzy logic search for fully qualified names

I'm looking for a search engine capable of using fuzzy logic algorithms to find matches over millions of hotel name names. The idea is to be able to find / suggest hotel names if the input is wrong or not in the same order as expected.

I tried myself building a FOXX app with ArangoDB using the clj-fuzzy library. The freetext collection applies one algorithm (Metaphone, Double Metaphone, Soundex, NYSIIS, Caverphone, Cologne Phonetic, or MRA codex) per record and stores it in the code attribute. A FULLTEXT index is created on this field and this AQL query is used:

/*
Example using doubleMetaphone
-----------------------------
Hotel: Four (FR) Points (PNTS) By (P) Sheraton (XRTN) Daning (TNNK)
Input: Sheraton (XRTN) Points (PNTS)
*/
for h in FLATTEN(UNION(
    (return FULLTEXT(fte_hotels, "fullcode", "XRTN")),
    (return FULLTEXT(fte_hotels, "fullcode", "PNTS"))
))
let score = (CONTAINS(h.fullcode, "XRTN") && CONTAINS(h.fullcode, "PNTS") ? 10:0) +
            (CONTAINS(h.fullcode, "XRTN") ? 1:0) +
            (CONTAINS(h.fullcode, "PNTS") ? 1:0)
sort score desc
limit 10
return { hotel: h, score: score }

      

Revision: Any other suggestions on how to implement these with Sql Server?

thank

+3


source to share


1 answer


What you have been doing looks good. If you want to do a fuzzy word search, you need to do some word preprocessing using a specialized algorithm.

Slight query optimization: you can also search for both search positions in the same call FULLTEXT

if you prefer.

Next block



for h in FLATTEN(UNION(
    (return FULLTEXT(fte_hotels, "fullcode", "XRTN")),
    (return FULLTEXT(fte_hotels, "fullcode", "PNTS"))
))
...

      

should be converted to the following somewhat simpler expression:

for h in FULLTEXT(fte_hotels, "fullcode", "XRTN,|PNTS")
...

      

+3


source







All Articles