Removing matching items from two lists of dicts
I need to take two dictionaries and filter out garbage items that are unrecognized names:
data = [
{'annotation_id': 22, 'record_id': 5, 'name': 'Joe Young'},
{'annotation_id': 13, 'record_id': 7, 'name': '----'},
{'annotation_id': 12, 'record_id': 9, 'name': 'Greg Band'},
]
garbage = [
{'annotation_id': 13, 'record_id': 7, 'name': '----'}
]
So in this case, I need to remove annotation_id 13 from the data.
I've tried iterating over the list and deleting it, but my understanding is that this doesn't work in python. I also tried enumerating the list, but I couldn't do it either. What am I doing wrong? My code is below:
data = [[item for item in data if item['name'] != g['name'] for g in garbage]
The above code creates many duplicate versions of dicts.
source to share
Simple and elegant way to remove specific entries in dicts arrays, where garbage
is a list of dicts entries to remove from data
:
for g in garbage:
if g in data:
data.remove(g)
Input data:
data = [
{'annotation_id': 22, 'record_id': 5, 'name': 'Joe Young'},
{'annotation_id': 13, 'record_id': 7, 'name': '----'},
{'annotation_id': 12, 'record_id': 9, 'name': 'Greg Band'},
]
garbage = [
{'annotation_id': 13, 'record_id': 7, 'name': '----'}
]
Result:
data = [
{'record_id': 5, 'annotation_id': 22, 'name': 'Joe Young'},
{'record_id': 9, 'annotation_id': 12, 'name': 'Greg Band'}
]
source to share
You can create a collection to hold the garbage names and then filter the data based on that nameset (if name is the criteria to be filtered):
garbage_names = {d['name'] for d in garbage}
[item for item in data if item['name'] not in garbage_names]
#[{'annotation_id': 22, 'name': 'Joe Young', 'record_id': 5},
# {'annotation_id': 12, 'name': 'Greg Band', 'record_id': 9}]
As noted in the comments, you can also do [item for item in data if all(item['name'] != g['name'] for g in garbage)]
after your original approach, but will be slightly less efficient due to the double loop which has O (M * N) time complexity, and will pre-build the set reduce the time complexity to O (M + N ), a little naive time here:
%timeit [item for item in data if all(item['name'] != g['name'] for g in garbage)]
# 1000000 loops, best of 3: 1.68 ยตs per loop
%%timeit
garbage_names = {d['name'] for d in garbage}
[item for item in data if item['name'] not in garbage_names]
# 1000000 loops, best of 3: 608 ns per loop
source to share