Removing matching items from two lists of dicts

I need to take two dictionaries and filter out garbage items that are unrecognized names:

data = [
    {'annotation_id': 22, 'record_id': 5, 'name': 'Joe Young'},
    {'annotation_id': 13, 'record_id': 7, 'name': '----'},
    {'annotation_id': 12, 'record_id': 9, 'name': 'Greg Band'},
]

garbage = [
    {'annotation_id': 13, 'record_id': 7, 'name': '----'}
]

      

So in this case, I need to remove annotation_id 13 from the data.

I've tried iterating over the list and deleting it, but my understanding is that this doesn't work in python. I also tried enumerating the list, but I couldn't do it either. What am I doing wrong? My code is below:

data = [[item for item in data if item['name'] != g['name'] for g in garbage]

      

The above code creates many duplicate versions of dicts.

+3


source to share


3 answers


Simple and elegant way to remove specific entries in dicts arrays, where garbage

is a list of dicts entries to remove from data

:

 for g in garbage:
    if g in data:
        data.remove(g)

      

Input data:



data = [
    {'annotation_id': 22, 'record_id': 5, 'name': 'Joe Young'},
    {'annotation_id': 13, 'record_id': 7, 'name': '----'},
    {'annotation_id': 12, 'record_id': 9, 'name': 'Greg Band'},
]

garbage = [
    {'annotation_id': 13, 'record_id': 7, 'name': '----'}
]

      

Result:

data = [
    {'record_id': 5, 'annotation_id': 22, 'name': 'Joe Young'}, 
    {'record_id': 9, 'annotation_id': 12, 'name': 'Greg Band'}
]

      

+3


source


You can create a collection to hold the garbage names and then filter the data based on that nameset (if name is the criteria to be filtered):

garbage_names = {d['name'] for d in garbage}

[item for item in data if item['name'] not in garbage_names]
#[{'annotation_id': 22, 'name': 'Joe Young', 'record_id': 5},
# {'annotation_id': 12, 'name': 'Greg Band', 'record_id': 9}]

      




As noted in the comments, you can also do [item for item in data if all(item['name'] != g['name'] for g in garbage)]

after your original approach, but will be slightly less efficient due to the double loop which has O (M * N) time complexity, and will pre-build the set reduce the time complexity to O (M + N ), a little naive time here:

%timeit [item for item in data if all(item['name'] != g['name'] for g in garbage)]
# 1000000 loops, best of 3: 1.68 ยตs per loop

%%timeit
garbage_names = {d['name'] for d in garbage}
[item for item in data if item['name'] not in garbage_names]
# 1000000 loops, best of 3: 608 ns per loop

      

+1


source


How about a simple one filter

?

filter(lambda x: x not in garbage, data)

[{'annotation_id': 22, 'name': 'Joe Young', 'record_id': 5},
 {'annotation_id': 12, 'name': 'Greg Band', 'record_id': 9}]

      

+1


source







All Articles