Removing values โโfrom a list of tuples
I have a list of tuples that I would like to return only the second column of data and only unique values
mytuple = [('Andrew','Andrew@gmail.com','20'),('Jim',"Jim@gmail.com",'12'),("Sarah","Sarah@gmail.com",'43'),("Jim","Jim@gmail.com",'15'),("Andrew","Andrew@gmail.com",'56')]
Desired output:
['Andrew@gmail.com','Jim@gmail.com','Sarah@gmail.com']
My idea would be to iterate over the list and add the item from the second column to the new list, then use the following code. Before I take this path too far, I know there is a better way to do it.
from collections import Counter
cnt = Counter(mytuple_new)
unique_mytuple_new = [k for k, v in cnt.iteritems() if v > 1]
source to share
You can use the function zip
:
>>> set(zip(*mytuple)[1])
set(['Sarah@gmail.com', 'Jim@gmail.com', 'Andrew@gmail.com'])
Or, as a less efficient way, you can use map
and operator.itemgetter
and use set
to get a unique tuple:
>>> from operator import itemgetter
>>> tuple(set(map(lambda x:itemgetter(1)(x),mytuple)))
('Sarah@gmail.com', 'Jim@gmail.com', 'Andrew@gmail.com')
comparative analysis of some answers:
my answer:
s = """\
mytuple = [('Andrew','Andrew@gmail.com','20'),('Jim',"Jim@gmail.com",'12'),("Sarah","Sarah@gmail.com",'43'),("Jim","Jim@gmail.com",'15'),("Andrew","Andrew@gmail.com",'56')]
set(zip(*mytuple)[1])
"""
print timeit.timeit(stmt=s, number=100000)
0.0740020275116
Icodez answer:
s = """\
mytuple = [('Andrew','Andrew@gmail.com','20'),('Jim',"Jim@gmail.com",'12'),("Sarah","Sarah@gmail.com",'43'),("Jim","Jim@gmail.com",'15'),("Andrew","Andrew@gmail.com",'56')]
seen = set()
[x[1] for x in mytuple if x[1] not in seen and not seen.add(x[1])]
"""
print timeit.timeit(stmt=s, number=100000)
0.0938332080841
Hasan's answer:
s = """\
mytuple = [('Andrew','Andrew@gmail.com','20'),('Jim',"Jim@gmail.com",'12'),("Sarah","Sarah@gmail.com",'43'),("Jim","Jim@gmail.com",'15'),("Andrew","Andrew@gmail.com",'56')]
set([k[1] for k in mytuple])
"""
print timeit.timeit(stmt=s, number=100000)
0.0699651241302
Adem's answer:
s = """
from itertools import izip
mytuple = [('Andrew','Andrew@gmail.com','20'),('Jim',"Jim@gmail.com",'12'),("Sarah","Sarah@gmail.com",'43'),("Jim","Jim@gmail.com",'15'),("Andrew","Andrew@gmail.com",'56')]
set(map(lambda x: x[1], mytuple))
"""
print timeit.timeit(stmt=s, number=100000)
0.237300872803 !!!
source to share
You can use a list comprehension and set to keep track of the values โโseen:
>>> mytuple = [('Andrew','Andrew@gmail.com','20'),('Jim',"Jim@gmail.com",'12'),("Sarah","Sarah@gmail.com",'43'),("Jim","Jim@gmail.com",'15'),("Andrew","Andrew@gmail.com",'56')]
>>> seen = set()
>>> [x[1] for x in mytuple if x[1] not in seen and not seen.add(x[1])]
['Andrew@gmail.com', 'Jim@gmail.com', 'Sarah@gmail.com']
>>>
The most important part of this solution is that the order is preserved, as in your example. Doing it alone set(x[1] for x in mytuple)
or something similar will give you unique items, but their order will be lost.
Also, it if x[1] not in seen and not seen.add(x[1])
might sound a little odd, but it's actually a neat trick that lets you add items to a set inside a list comprehension (otherwise we need to use a for-loop).
Since it and
does short-circuit evaluation in Python, not seen.add(x[1])
will only evaluate if it x[1] not in seen
returns True
. So the condition sees if it x[1]
is in the set and adds it if not.
The statement not
is placed before seen.add(x[1])
, so that the condition is evaluated as True
if x[1]
necessary to be added to the set ( set.add
returns None
that is treated as False
. not False
True
).
source to share
How about an obvious and simple loop? No need to create a list and then convert to a set, just don't add duplicates.
mytuple = [('Andrew','Andrew@gmail.com','20'),('Jim',"Jim@gmail.com",'12'),("Sarah","Sarah@gmail.com",'43'),("Jim","Jim@gmail.com",'15'),("Andrew","Andrew@gmail.com",'56')]
result = []
for item in mytuple:
if item[1] not in result:
result.append(item[1])
print result
Output:
['Andrew@gmail.com', 'Jim@gmail.com', 'Sarah@gmail.com']
source to share
Is the order of the items important? Many of the suggested answers use a set
unique list. It is good, correct and fulfilled if the order is not important. If order matters, you can use OrderedDict
to perform set-like unique-ification while maintaining order.
# test data
mytuple = [('Andrew','Andrew@gmail.com','20'),('Jim',"Jim@gmail.com",'12'),("Sarah","Sarah@gmail.com",'43'),("Jim","Jim@gmail.com",'15'),("Andrew","Andrew@gmail.com",'56')]
from collections import OrderedDict
emails = list(OrderedDict((t[1], 1) for t in mytuple).keys())
print emails
Yielding:
['Andrew@gmail.com', 'Jim@gmail.com', 'Sarah@gmail.com']
Update
Based on the suggestion from iCodez, repeat the answer to:
from collections import OrderedDict
emails = list(OrderedDict.fromkeys(t[1] for t in mytuple).keys())
source to share