How do I compare multiple key values ​​from a list of dictionaries?

I have a list of dictionaries that have the same structure in the list. For example:

test_data = [{'id':1, 'value':'one'}, {'id':2, 'value':'two'}, {'id':3, 'value':'three'}]

      

What I need to do is compare each of these dictionaries and return "similar" dictionaries based on a key value pair. For example, given a key value

and a value oen

, I want to find all matching dictionaries, almost similar to oen

which in this case would be [{'id':1, 'value':'one'}]

.

difflib

has a function get_close_matches

that is close to what I need. I can extract the values ​​of a particular key using a list comprehension and then compare those values ​​with my search:

values = [ item['value'] for item in test_data ]
found_vals = get_close_matches('oen', values) #returns ['one']

      

What I need is to take it one step further and link everything together with the original dictionary:

In  [1]: get_close_dicts('oen', test_data, 'value')
Out [1]: [{'id':1, 'value':'one'}]

      


Note. The list of dictionaries is quite long and so I hope to be as efficient / fast as possible.

+3


source to share


3 answers


You can create a reverse dict lookup prior to running get_close_dicts on your data, so once you get back to the set of values, you can use them to find the matching dict (s).

If you have unique values ​​for your keys for key value, you can do:

reverselookup = {thedict['value']:thedict for thedict in test_data}

      

If, however, you need to handle the case where multiple dicts will have the same value for the "value" key, then you need to match all of them (this will give you a dict where key is the value in 'value' and value is a list of dicts that have this meaning):

from collections import defaultdict
reverselookup = defaultdict(list)
for testdict in test_data:
    reverselookup[testdict['value']].append(testdict)

      

For example, if your test data had an additional dict in it, like:



>>> test_data = [{'id':1, 'value':'one'}, {'id':2, 'value':'two'}, 
                 {'id':3, 'value':'three'}, {'id':4, 'value':'three'}]

      

Then the above reverse lookup construct will give you this:

{
  "three": [
    {
      "id": 3,
      "value": "three"
    },
    {
      "id": 4,
      "value": "three"
    }
  ],
  "two": [
    {
      "id": 2,
      "value": "two"
    }
  ],
  "one": [
    {
      "id": 1,
      "value": "one"
    }
  ]
}

      

Then, after you have your values, just load the dicts (then you can concatenate if you have a list of use cases for lists, no need for chaining if you have the first use case):

from itertools import chain    
chain(*[reverselookup[val] for val in found_vals])

      

+2


source


You can:

return [d for d in test_data if get_close_matches('oen', [d['value'])]]

      



Note that get_close_matches can return more than one result.

0


source


Regardless, at some point you will end up iterating through each dictionary. You can't do it there. What you can do is get all the work done in the preprocessing stage to call your actual function calls immediately.

As ValAyal mentioned, a good reverse lookup dictionary is needed here. I represent a dictionary value_dict

where key

is the value from the first dictionary and value

contains both exact and similar matches value

. Take this example with d1

and d2

that are on your list that you want to accomplish. If a

d1 = {'id':1, 'value':'one'}
d2 = {'id':3, 'value':'oen'}

      

Then:

value_dict["one"] = {"exact": [d1], "close": [d2]}
value_dict["oen"] = {"exact": [d2], "close": [d1]}

      

Whenever you insert a dictionary that has a meaning you have already seen, you can immediately identify all exact and close matches (just by looking at that value) and add to the various lists accordingly. If you have a new value that has not been noticed before, you will have to compare it to all values ​​currently in value_dict

. For example, if you want to add

d3 = {'id':5, 'value':'one'}

      

You will browse value_dict["one"]

and receive lists exact

and close

. These lists include all other entries value_dict

that need to be changed. You need to add one

close matches to exact matches oen

; you can get both of these values ​​from the returned lists. As a result, you will receive

value_dict["one"] = {"exact": [d1, d3], "close": [d2]}
value_dict["oen"] = {"exact": [d2], "close": [d1, d3]}

      

So, once all this preprocessing is done, your function becomes simpler: something like get_close_dicts(val)

(I don't know what the third argument does in your example) might just do return value_dict[val]["exact"] + value_dict[val]["close"]

. You now have a function that provides an immediate response.

The preprocessing step is pretty tricky, but the resulting speedup in get_close_dicts

will hopefully make up for it. I can elaborate on this when I get back from work if you want to know how to implement this. Hopefully this can give you a good idea of ​​a useful data structure, and I haven't terribly underestimated it.

0


source







All Articles