How can I remove a duplicate dict in a list, ignoring the dict key?

Question

How can I remove a duplicate dict in a list, ignoring the dict key?

I have a list of dictionaries. Each dictionary has several key values and one arbitrary (but important) key-value pair. for example

thelist = [
    {"key" : "value1", "k2" : "va1", "ignore_key" : "arb1"}, 
    {"key" : "value2", "k2" : "va2", "ignore_key" : "arb11"},
    {"key" : "value2", "k2" : "va2", "ignore_key" : "arb113"}
]

I would like to remove duplicate dictionaries in such a way that only ignored key values are ignored. I've seen the question, so that, but it only considers completely identical dicts. Is there a way to remove the almost duplicate so that the above data is

thelist = [
    {"key" : "value1", "k2" : "va1", "ignore_key" : "arb1"}, 
    {"key" : "value2", "k2" : "va2", "ignore_key" : "arb11"}
]

It doesn't matter which duplicate is ignored. How can i do this?

+3

python python-2.7

user4467853 May 17 '15 at 14:19

source to share

7 replies

Padraic cunningham · Answer 1 · 2015-05-17T14:34:51+0000

Keep a set of visible values for key

and remove any dict that has the same meaning:

st = set()

for d in thelist[:]:
    vals = d["key"],d["k2"]
    if vals in st:
        thelist.remove(d)
    st.add(vals)
print(thelist)

[{'k2': 'va1', 'ignore_key': 'arb1', 'key': 'value1'},
{'k2': 'va2', 'ignore_key': 'arb11', 'key': 'value2'}]

If the values are always grouped, you can use value

from key

to group and get the first dict from each group:

from itertools import groupby
from operator import itemgetter
thelist[:] = [next(v) for _, v in groupby(thelist,itemgetter("key","k2"))]
print(thelist)]

print(thelist)
[{'key': 'value1', 'k2': 'va1', 'ignore_key': 'arb1'}, 
{'key': 'value2', 'k2': 'va2', 'ignore_key': 'arb11'}]

Or using a generator like DSM's answer to modify the original list without copying:

def filt(l):
    st = set()
    for d in l:
        vals = d["key"],d["k2"]
        if vals not in st:
            yield d
        st.add(vals)


thelist[:] = filt(thelist)

print(thelist)

 [{'k2': 'va1', 'ignore_key': 'arb1', 'key': 'value1'}, 
{'k2': 'va2', 'ignore_key': 'arb11', 'key': 'value2'}]

If you don't care what the delete is being removed, just use reverse:

st = set()

for d in reversed(thelist):
    vals = d["key"],d["k2"]
    if vals in st:
        thelist.remove(d)
    st.add(vals)
print(thelist)

To ignore all bars, ignore_key using groupby:

from itertools import groupby

thelist[:] = [next(v) for _, v in groupby(thelist, lambda d: 
                [val for k, val in d.items() if k != "ignore_key"])]
print(thelist)
[{'key': 'value1', 'k2': 'va1', 'ignore_key': 'arb1'},
 {'key': 'value2', 'k2': 'va2', 'ignore_key': 'arb11'}]

DSM · Answer 2 · 2015-05-17T14:39:54+0000

You can cram things into a line or two, but I think this is just for writing a function:

def f(seq, ignore_keys):
    seen = set()
    for elem in seq:
        index = frozenset((k,v) for k,v in elem.items() if k not in ignore_keys)
        if index not in seen:
            yield elem
            seen.add(index)

which gives

>>> list(f(thelist, ["ignore_key"]))
[{'ignore_key': 'arb1', 'k2': 'va1', 'key': 'value1'}, 
 {'ignore_key': 'arb11', 'k2': 'va2', 'key': 'value2'}]

This assumes your values are hashed. (If not, the same code will work with seen = []

and seen.append(index)

, although for long lists it will have poor performance.)

Ami tavory · Answer 3 · 2015-05-17T14:33:37+0000

Running with original list:

thelist = [
    {"key" : "value1", "ignore_key" : "arb1"}, 
    {"key" : "value2", "ignore_key" : "arb11"},
    {"key" : "value2", "ignore_key" : "arb113"}
]

Create a set and populate it while filtering the list.

uniques, theNewList = set(), []
for d in theList:]
    cur = d["key"] # Avoid multiple lookups of the same thing
    if cur not in uniques:
        theNewList.append(d)
    uniques.add(cur)

Finally, rename the list:

theList = theNewList

patricia · Answer 4 · 2015-05-17T14:29:58+0000

Instead of using a list of dicts, you can use dict of dicts. The key value for each of your dict's will be the key on the main dict.

Like this:

thedict = {}

thedict["value1"] = {"ignore_key" : "arb1", ...}  
thedict["value2"] = {"ignore_key" : "arb11", ...}

Since the dict will not allow duplicate keys, your problem will not exist.

f43d65 · Answer 5 · 2015-05-17T14:42:20+0000

Without change thelist

result = []
seen = set()
thelist = [
    {"key" : "value1", "ignore_key" : "arb1"},
    {"key" : "value2", "ignore_key" : "arb11"},
    {"key" : "value2", "ignore_key" : "arb113"}
]

for item in thelist:
    if item['key'] not in seen:
        result.append(item)
        seen.add(item['key'])

print(result)

Brian lee · Answer 6 · 2015-05-17T14:56:31+0000

Create a set of unique values and check (& update) that:

values = {d['key'] for d in thelist}
newlist = []

for d in thelist:
    if d['key'] in values:
        newlist.append(d)
        values -= {d['key']}

thelist = newlist

martineau · Answer 7 · 2015-05-17T16:57:15+0000

You can adapt the accepted answer to a related question by using a dictionary instead of a duplicate remover dial.

The following example creates a temporary dictionary whose keys are a tuple of elements in each dictionary in thelist

except the ignored dictionary , which is stored as the value associated with each of those keys. This eliminates duplicates as they will become the same key, but retains the ignored key and its ignored value (the last or just one).

The second step recreates thelist

by creating dictionaries consisting of a combination of each key plus its associated value from the elements in the temporary dictionary.

You could combine these two steps into a completely unreadable one-line if you like ...

thelist = [
    {"key" : "value1", "k2" : "va1", "ignore_key" : "arb1"},
    {"key" : "value2", "k2" : "va2", "ignore_key" : "arb11"},
    {"key" : "value2", "k2" : "va2", "ignore_key" : "arb113"}
]

IGNORED = "ignore_key"
temp = dict((tuple(item for item in d.items() if item[0] != IGNORED),
             (IGNORED, d.get(IGNORED))) for d in thelist)
thelist = [dict(key + (value,)) for key, value in temp.iteritems()]

for item in thelist:
    print item

Output:

{'ignore_key': 'arb1', 'k2': 'va1', 'key': 'value1'}
{'ignore_key': 'arb113', 'k2': 'va2', 'key': 'value2'}

How can I remove a duplicate dict in a list, ignoring the dict key?

More articles: