Fuzzy logic search for fully qualified names
I'm looking for a search engine capable of using fuzzy logic algorithms to find matches over millions of hotel name names. The idea is to be able to find / suggest hotel names if the input is wrong or not in the same order as expected.
I tried myself building a FOXX app with ArangoDB using the clj-fuzzy library. The freetext collection applies one algorithm (Metaphone, Double Metaphone, Soundex, NYSIIS, Caverphone, Cologne Phonetic, or MRA codex) per record and stores it in the code attribute. A FULLTEXT index is created on this field and this AQL query is used:
/*
Example using doubleMetaphone
-----------------------------
Hotel: Four (FR) Points (PNTS) By (P) Sheraton (XRTN) Daning (TNNK)
Input: Sheraton (XRTN) Points (PNTS)
*/
for h in FLATTEN(UNION(
(return FULLTEXT(fte_hotels, "fullcode", "XRTN")),
(return FULLTEXT(fte_hotels, "fullcode", "PNTS"))
))
let score = (CONTAINS(h.fullcode, "XRTN") && CONTAINS(h.fullcode, "PNTS") ? 10:0) +
(CONTAINS(h.fullcode, "XRTN") ? 1:0) +
(CONTAINS(h.fullcode, "PNTS") ? 1:0)
sort score desc
limit 10
return { hotel: h, score: score }
Revision: Any other suggestions on how to implement these with Sql Server?
thank
source to share
What you have been doing looks good. If you want to do a fuzzy word search, you need to do some word preprocessing using a specialized algorithm.
Slight query optimization: you can also search for both search positions in the same call FULLTEXT
if you prefer.
Next block
for h in FLATTEN(UNION(
(return FULLTEXT(fte_hotels, "fullcode", "XRTN")),
(return FULLTEXT(fte_hotels, "fullcode", "PNTS"))
))
...
should be converted to the following somewhat simpler expression:
for h in FULLTEXT(fte_hotels, "fullcode", "XRTN,|PNTS")
...
source to share