How can I remove a duplicate dict in a list, ignoring the dict key?

I have a list of dictionaries. Each dictionary has several key values ​​and one arbitrary (but important) key-value pair. for example

thelist = [
    {"key" : "value1", "k2" : "va1", "ignore_key" : "arb1"}, 
    {"key" : "value2", "k2" : "va2", "ignore_key" : "arb11"},
    {"key" : "value2", "k2" : "va2", "ignore_key" : "arb113"}
]

      

I would like to remove duplicate dictionaries in such a way that only ignored key values ​​are ignored. I've seen the question, so that, but it only considers completely identical dicts. Is there a way to remove the almost duplicate so that the above data is

thelist = [
    {"key" : "value1", "k2" : "va1", "ignore_key" : "arb1"}, 
    {"key" : "value2", "k2" : "va2", "ignore_key" : "arb11"}
]

      

It doesn't matter which duplicate is ignored. How can i do this?

+3


source to share


7 replies


Keep a set of visible values ​​for key

and remove any dict that has the same meaning:

st = set()

for d in thelist[:]:
    vals = d["key"],d["k2"]
    if vals in st:
        thelist.remove(d)
    st.add(vals)
print(thelist)

[{'k2': 'va1', 'ignore_key': 'arb1', 'key': 'value1'},
{'k2': 'va2', 'ignore_key': 'arb11', 'key': 'value2'}]

      

If the values ​​are always grouped, you can use value

from key

to group and get the first dict from each group:

from itertools import groupby
from operator import itemgetter
thelist[:] = [next(v) for _, v in groupby(thelist,itemgetter("key","k2"))]
print(thelist)]

print(thelist)
[{'key': 'value1', 'k2': 'va1', 'ignore_key': 'arb1'}, 
{'key': 'value2', 'k2': 'va2', 'ignore_key': 'arb11'}]

      

Or using a generator like DSM's answer to modify the original list without copying:



def filt(l):
    st = set()
    for d in l:
        vals = d["key"],d["k2"]
        if vals not in st:
            yield d
        st.add(vals)


thelist[:] = filt(thelist)

print(thelist)

 [{'k2': 'va1', 'ignore_key': 'arb1', 'key': 'value1'}, 
{'k2': 'va2', 'ignore_key': 'arb11', 'key': 'value2'}]

      

If you don't care what the delete is being removed, just use reverse:

st = set()

for d in reversed(thelist):
    vals = d["key"],d["k2"]
    if vals in st:
        thelist.remove(d)
    st.add(vals)
print(thelist)

      

To ignore all bars, ignore_key using groupby:

from itertools import groupby

thelist[:] = [next(v) for _, v in groupby(thelist, lambda d: 
                [val for k, val in d.items() if k != "ignore_key"])]
print(thelist)
[{'key': 'value1', 'k2': 'va1', 'ignore_key': 'arb1'},
 {'key': 'value2', 'k2': 'va2', 'ignore_key': 'arb11'}]

      

+5


source


You can cram things into a line or two, but I think this is just for writing a function:

def f(seq, ignore_keys):
    seen = set()
    for elem in seq:
        index = frozenset((k,v) for k,v in elem.items() if k not in ignore_keys)
        if index not in seen:
            yield elem
            seen.add(index)

      

which gives



>>> list(f(thelist, ["ignore_key"]))
[{'ignore_key': 'arb1', 'k2': 'va1', 'key': 'value1'}, 
 {'ignore_key': 'arb11', 'k2': 'va2', 'key': 'value2'}]

      

This assumes your values ​​are hashed. (If not, the same code will work with seen = []

and seen.append(index)

, although for long lists it will have poor performance.)

+2


source


Running with original list:

thelist = [
    {"key" : "value1", "ignore_key" : "arb1"}, 
    {"key" : "value2", "ignore_key" : "arb11"},
    {"key" : "value2", "ignore_key" : "arb113"}
]

      

Create a set and populate it while filtering the list.

uniques, theNewList = set(), []
for d in theList:]
    cur = d["key"] # Avoid multiple lookups of the same thing
    if cur not in uniques:
        theNewList.append(d)
    uniques.add(cur)

      

Finally, rename the list:

theList = theNewList

      

+1


source


Instead of using a list of dicts, you can use dict of dicts. The key value for each of your dict's will be the key on the main dict.

Like this:

thedict = {}

thedict["value1"] = {"ignore_key" : "arb1", ...}  
thedict["value2"] = {"ignore_key" : "arb11", ...}

      

Since the dict will not allow duplicate keys, your problem will not exist.

0


source


Without change thelist

result = []
seen = set()
thelist = [
    {"key" : "value1", "ignore_key" : "arb1"},
    {"key" : "value2", "ignore_key" : "arb11"},
    {"key" : "value2", "ignore_key" : "arb113"}
]

for item in thelist:
    if item['key'] not in seen:
        result.append(item)
        seen.add(item['key'])

print(result)

      

0


source


Create a set of unique values ​​and check (& ​​update) that:

values = {d['key'] for d in thelist}
newlist = []

for d in thelist:
    if d['key'] in values:
        newlist.append(d)
        values -= {d['key']}

thelist = newlist

      

0


source


You can adapt the accepted answer to a related question by using a dictionary instead of a duplicate remover dial.

The following example creates a temporary dictionary whose keys are a tuple of elements in each dictionary in thelist

except the ignored dictionary , which is stored as the value associated with each of those keys. This eliminates duplicates as they will become the same key, but retains the ignored key and its ignored value (the last or just one).

The second step recreates thelist

by creating dictionaries consisting of a combination of each key plus its associated value from the elements in the temporary dictionary.

You could combine these two steps into a completely unreadable one-line if you like ...

thelist = [
    {"key" : "value1", "k2" : "va1", "ignore_key" : "arb1"},
    {"key" : "value2", "k2" : "va2", "ignore_key" : "arb11"},
    {"key" : "value2", "k2" : "va2", "ignore_key" : "arb113"}
]

IGNORED = "ignore_key"
temp = dict((tuple(item for item in d.items() if item[0] != IGNORED),
             (IGNORED, d.get(IGNORED))) for d in thelist)
thelist = [dict(key + (value,)) for key, value in temp.iteritems()]

for item in thelist:
    print item

      

Output:

{'ignore_key': 'arb1', 'k2': 'va1', 'key': 'value1'}
{'ignore_key': 'arb113', 'k2': 'va2', 'key': 'value2'}

      

0


source







All Articles