Get close matches for multiple words in a dictionary

Question

Get close matches for multiple words in a dictionary

I have a dictionary with the following structure:

{
    1: {"names": ["name1_A", "name1_B", ...]},
    2: {"names": ["name2_A", "name2_B", ...]},
    ...
}

where name1_A

and name1_B

are synonyms / aliases / different ways of writing the same name, whose identifier is 1. name2_A

and name2_B

are aliases with the same name, whose identifier is 2, and therefore on.

I need to write a function that takes user input and returns a name id whose alias most closely resembles user input.

I know it's not very intuitive what I mean, here's an example. Let's say this is my dictionary:

{
    1: {"names": ["James", "Jamie"]},
    2: {"names": ["Karen", "Karyn"]}
}

The user enters a word Jimmy

. Since the closest match is Jimmy

from the dictionary Jamie

, the function should return an identifier of 1.

If the user types in the world Karena

since the closest match Karen

, the function should return an ID of 2.

I think the best way to get the closest math is to use difflib get_close_matches()

. However, this function takes a list of possibilities as an argument and I cannot think of how to use it correctly in my function. Any help would be appreciated.

+3

python string dictionary fuzzy-search

user2747949 June 29. 17 at 21:09

source to share

1 answer

coldspeed · Answer 1 · 2017-06-29T21:39:40+0000

If you are interested in third-party modules, there is a little little module that I like to use for this kind of thing called fuzzywuzzy

for fuzzy string mapping in Python. This module uses the Levenshtein Distance label to calculate the distance between two lines. Here's an example of how you use it:

>>> from fuzzywuzzy import fuzz
>>> from functools import partial
>>> data_dict = {
...     1: {"names": ["James", "Jamie"]},
...     2: {"names": ["Karen", "Karyn"]}
... }
>>> input_str = 'Karena'
>>> f = partial(fuzz.partial_ratio, input_str)
>>> matches = { k : max(data_dict[k]['names'], key=f) for k in data_dict}
>>> matches
{1: 'James', 2: 'Karen'}
>>> { i : (matches[i], f(matches[i])) for i in matches }
{1: ('James', 40), 2: ('Karen', 100)}

Now you can check Karen

out since it has the highest score.

I had to call this function twice for this demo, but you can only do this once depending on how you extend this example.

Another note: fuzz.partial_ratio

softer with its matches. For a stricter matching scheme, consider using fuzz.ratio

.

You can browse some more examples using fuzzy lines matching here .

Get close matches for multiple words in a dictionary

More articles: