How can I remove a duplicate dict in a list, ignoring the dict key?
I have a list of dictionaries. Each dictionary has several key values and one arbitrary (but important) key-value pair. for example
thelist = [
{"key" : "value1", "k2" : "va1", "ignore_key" : "arb1"},
{"key" : "value2", "k2" : "va2", "ignore_key" : "arb11"},
{"key" : "value2", "k2" : "va2", "ignore_key" : "arb113"}
]
I would like to remove duplicate dictionaries in such a way that only ignored key values are ignored. I've seen the question, so that, but it only considers completely identical dicts. Is there a way to remove the almost duplicate so that the above data is
thelist = [
{"key" : "value1", "k2" : "va1", "ignore_key" : "arb1"},
{"key" : "value2", "k2" : "va2", "ignore_key" : "arb11"}
]
It doesn't matter which duplicate is ignored. How can i do this?
source to share
Keep a set of visible values for key
and remove any dict that has the same meaning:
st = set()
for d in thelist[:]:
vals = d["key"],d["k2"]
if vals in st:
thelist.remove(d)
st.add(vals)
print(thelist)
[{'k2': 'va1', 'ignore_key': 'arb1', 'key': 'value1'},
{'k2': 'va2', 'ignore_key': 'arb11', 'key': 'value2'}]
If the values are always grouped, you can use value
from key
to group and get the first dict from each group:
from itertools import groupby
from operator import itemgetter
thelist[:] = [next(v) for _, v in groupby(thelist,itemgetter("key","k2"))]
print(thelist)]
print(thelist)
[{'key': 'value1', 'k2': 'va1', 'ignore_key': 'arb1'},
{'key': 'value2', 'k2': 'va2', 'ignore_key': 'arb11'}]
Or using a generator like DSM's answer to modify the original list without copying:
def filt(l):
st = set()
for d in l:
vals = d["key"],d["k2"]
if vals not in st:
yield d
st.add(vals)
thelist[:] = filt(thelist)
print(thelist)
[{'k2': 'va1', 'ignore_key': 'arb1', 'key': 'value1'},
{'k2': 'va2', 'ignore_key': 'arb11', 'key': 'value2'}]
If you don't care what the delete is being removed, just use reverse:
st = set()
for d in reversed(thelist):
vals = d["key"],d["k2"]
if vals in st:
thelist.remove(d)
st.add(vals)
print(thelist)
To ignore all bars, ignore_key using groupby:
from itertools import groupby
thelist[:] = [next(v) for _, v in groupby(thelist, lambda d:
[val for k, val in d.items() if k != "ignore_key"])]
print(thelist)
[{'key': 'value1', 'k2': 'va1', 'ignore_key': 'arb1'},
{'key': 'value2', 'k2': 'va2', 'ignore_key': 'arb11'}]
source to share
You can cram things into a line or two, but I think this is just for writing a function:
def f(seq, ignore_keys):
seen = set()
for elem in seq:
index = frozenset((k,v) for k,v in elem.items() if k not in ignore_keys)
if index not in seen:
yield elem
seen.add(index)
which gives
>>> list(f(thelist, ["ignore_key"]))
[{'ignore_key': 'arb1', 'k2': 'va1', 'key': 'value1'},
{'ignore_key': 'arb11', 'k2': 'va2', 'key': 'value2'}]
This assumes your values are hashed. (If not, the same code will work with seen = []
and seen.append(index)
, although for long lists it will have poor performance.)
source to share
Running with original list:
thelist = [
{"key" : "value1", "ignore_key" : "arb1"},
{"key" : "value2", "ignore_key" : "arb11"},
{"key" : "value2", "ignore_key" : "arb113"}
]
Create a set and populate it while filtering the list.
uniques, theNewList = set(), []
for d in theList:]
cur = d["key"] # Avoid multiple lookups of the same thing
if cur not in uniques:
theNewList.append(d)
uniques.add(cur)
Finally, rename the list:
theList = theNewList
source to share
Instead of using a list of dicts, you can use dict of dicts. The key value for each of your dict's will be the key on the main dict.
Like this:
thedict = {}
thedict["value1"] = {"ignore_key" : "arb1", ...}
thedict["value2"] = {"ignore_key" : "arb11", ...}
Since the dict will not allow duplicate keys, your problem will not exist.
source to share
Without change thelist
result = []
seen = set()
thelist = [
{"key" : "value1", "ignore_key" : "arb1"},
{"key" : "value2", "ignore_key" : "arb11"},
{"key" : "value2", "ignore_key" : "arb113"}
]
for item in thelist:
if item['key'] not in seen:
result.append(item)
seen.add(item['key'])
print(result)
source to share
You can adapt the accepted answer to a related question by using a dictionary instead of a duplicate remover dial.
The following example creates a temporary dictionary whose keys are a tuple of elements in each dictionary in thelist
except the ignored dictionary , which is stored as the value associated with each of those keys. This eliminates duplicates as they will become the same key, but retains the ignored key and its ignored value (the last or just one).
The second step recreates thelist
by creating dictionaries consisting of a combination of each key plus its associated value from the elements in the temporary dictionary.
You could combine these two steps into a completely unreadable one-line if you like ...
thelist = [
{"key" : "value1", "k2" : "va1", "ignore_key" : "arb1"},
{"key" : "value2", "k2" : "va2", "ignore_key" : "arb11"},
{"key" : "value2", "k2" : "va2", "ignore_key" : "arb113"}
]
IGNORED = "ignore_key"
temp = dict((tuple(item for item in d.items() if item[0] != IGNORED),
(IGNORED, d.get(IGNORED))) for d in thelist)
thelist = [dict(key + (value,)) for key, value in temp.iteritems()]
for item in thelist:
print item
Output:
{'ignore_key': 'arb1', 'k2': 'va1', 'key': 'value1'} {'ignore_key': 'arb113', 'k2': 'va2', 'key': 'value2'}
source to share