Python Merge 2 or more Dicts using value to handle duplicate keys

I am merging dictionaries with multiple duplicate keys. The values ​​will be different and I want to ignore the entry with the lower value.

dict1 = {1 :["in",1], 2 :["out",1], 3 :["in",1]}
dict2 = {1 :["out",2], 2 :["out",1]}

      

If the keys are equal, I want key[0][1

] with the largest value in the new dict. Merging these two dicts should result in:

dict3 = {1 :["out",2], 2 :["out",1], 3 :["in",1]}

      

The only way I know is to run a conditional loop to determine which one to add to the merged dict. Is there a more pythonic way to do this?

The duplicate keys will be very small and far apart, less than 1% if that makes any difference to the final solution.

+3


source to share


5 answers


dict1 = {1 :["in",1], 2 :["out",1], 3 :["in",1]}

dict2 = {1 :["out",2], 2 :["out",1]}
vals = []
# get items from dict1 and common keys with largest values
for k, v in dict1.iteritems():
    if k in dict2:
        if dict2[k][1] > v[1]:
            vals.append((k, dict2[k]))
        else:
            vals.append((k,v))
    else:
        vals.append((k,v))
new_d = {}
# add all dict2 to a new dict
new_d.update(dict2) 

# add dict1 items and overwrite common keys with larger value
for k,v in vals:
    new_d[k] = v
print(new_d)
{1: ['out', 2], 2: ['out', 1], 3: ['in', 1]}

      

You can also copy and delete:

cp_d1 = dict1.copy()
cp_d2 = dict2.copy()

for k, v in dict1.iteritems():
    if k in dict2:
        if dict2[k][1] > v[1]:
            del cp_d1[k]
        else:
            del cp_d2[k]
cp_d1.update(cp_d2)

print(cp_d1)
{1: ['out', 2], 2: ['out', 1], 3: ['in', 1]}

      



Some timings show that the copy is the most effective and the use groupby

is the least effective:

In [9]: %%timeit
   ...: vals = []
   ...: cp_d1 = dict1.copy()
   ...: cp_d2 = dict2.copy()
   ...: for k, v in dict1.iteritems():
   ...:     if k in dict2:
   ...:         if dict2[k][1] > v[1]:
   ...:             del cp_d1[k]
   ...:         else:
   ...:             del cp_d2[k]
   ...: cp_d1.update(cp_d2)
   ...: 

1000000 loops, best of 3: 1.61 µs per loop
In [20]: %%timeit


 ....: vals = []
   ....: for k, v in dict1.iteritems():
   ....:     if k in dict2:
   ....:         if dict2[k][1] > v[1]:
   ....:             vals.append((k, dict2[k]))
   ....:         else:
   ....:             vals.append((k,v))
   ....:     else:
   ....:         vals.append((k,v))
   ....: new_d = {}
   ....: new_d.update(dict2)
   ....: for k,v in vals:
   ....:     new_d[k] = v
   ....: 
100000 loops, best of 3: 2.11 µs per loop


In [10]: %%timeit                 
 {k: max(dict1.get(k), dict2.get(k), key=lambda x: x[1] if x else None)
  for k in dict1.viewkeys() | dict2.viewkeys()}
   ....: 
100000 loops, best of 3: 3.71 µs per loop

In [22]: %%timeit
   ....: l=dict2.items() +dict1.items() # if you are in python 3 use : list(dict1.items()) + list(dict2.items())
   ....: g=[list(g) for k,g in groupby(sorted(l),lambda x : x[0])]
   ....: dict([max(t,key=lambda x: x[1][1]) for t in g])
   ....: 
100000 loops, best of 3: 10.1 µs per loop


In [61]: %%timeit
   ....: conflictKeys = set(dict1) & set(dict2)  
   ....: solvedConflicts = { key: dict1[key] 
   ....:                       if dict1[key][1] > dict2[key][1] 
   ....:                       else dict2[key] 
   ....:                  for key in conflictKeys } 
   ....: result = dict1.copy()                     
   ....: result.update(dict2)                       
   ....: result.update(solvedConflicts)  
   ....: 

100000 loops, best of 3: 2.34 µs per loop

      

+1


source


One dictionary understanding can help about this.



from operator import itemgetter
{k: max(dict1.get(k, (None, float('-Inf'))), dict2.get(k, (None,float('-Inf'))),
key=itemgetter(1)) for k in dict1.viewkeys() | dict2.viewkeys()}

      

+2


source


A Python solution should be heavily based on the python standard library and available syntax. Not only to simplify your code, but also to improve performance.

In your case, you can take advantage of the fact that only 1% of the keys occur in both dictionaries:

 conflictKeys = set(dict1) & set(dict2)      # get all keys, that are in both dictionaries
 solvedConflicts = { key: dict1[key] 
                          if dict1[key][1] > dict2[key][1] 
                          else dict2[key] 
                     for key in conflictKeys }  # dictionary with conflict keys only and their wanted value

 result = dict1.copy()                       # add values unique to dict1 to result
 result.update(dict2)                        # add values unique to dict2 to result
 result.update(solvedConflicts)              # add values occuring in both dicts to result

      

This solution will try to avoid running the "slow" python interpreter for every key of the two dictionaries, but will use the fast python library routines (which are written in C). I.e:

  • dict.update()

    to combine both dictionaries
  • set.intersection()

    (synonym for set1 and set2) to get all conflicts

Just to resolve conflicting keys, you need the python interpreter to iterate over all the entries. But even here you can still benefit from the pythonic construct "comprehenion list" in terms of performance (versus the imperative for a loop). This is due to the fact that memory for solveConflicts can be allocated immediately without any reallocations. The loop requirement would be to increase the resulting resolvable Conflicts by item instead, and this requires a lot of memory reallocations.

+2


source


Using sets can also be helpful if the intersection of the elements below is as stated

def out_dict(dict1, dict2):
    dict3 = {}
    s1 = set(dict1)
    s2 = set(dict2)
    for i in s1-s2:
        dict3[i] = dict1[i]
    for i in s2-s1:
        dict3[i] = dict2[i]
    for i in s1.intersection(s2):
        dict3[i] = dict1[i] if dict1[i] >= dict2[i] else dict2[i]
    return dict3

      

Setting the difference allows you to select the elements of the difference in the list, and the intersection - for common keys between dictionaries.

+1


source


import operator

def choose_value(key, x, y):
    """Choose a value from either `x` or `y` per the problem requirements."""
    if key not in x:
        return y[key]
    if key not in y:
        return x[key]
    # "The maximum of x[key] and y[key], ordered by their [1] element"
    return max((x[key], y[key]), key=operator.itemgetter(1))

def merge(x, y):
    # "a dict mapping keys to the chosen value, using the union of the keys
    # from x and y as the result keys"
    return {
        key: choose_value(key, x, y)
        for key in x.keys() | y.keys()
    }

      

0


source







All Articles