How do I initialize a counter from a list of key-value pairs?

If I have a sequence of pairs (key, value)

, I can quickly initialize the dictionary like this:

>>> data = [ ('a', 1), ('b', 2) ]
>>> dict(data) 
{'a': 1, 'b': 2} 

      

I would like to do the same with a dictionary Counter

; but how? Both the constructor and the method update()

treat ordered pairs as keys, not key-value pairs:

>>> from collections import Counter
>>> Counter(data)
Counter({('a', 1): 1, ('b', 2): 1})

      

The best I could do is use a temporary dictionary, which is an ugly and unnecessary workaround:

>>> Counter(dict(data))
Counter({'b': 2, 'a': 1})

      

Is there a way to properly initialize Counter

from a list of pairs (key, count)

? My use case involves reading a large number of stored samples from files (with unique keys).

+3


source to share


3 answers


I would just do a loop:

for obj, cnt in [ ('a', 1), ('b', 2) ]:
    counter[obj] = cnt

      

You can also just call the parent method dict.update

:

>>> from collections import Counter
>>> data = [ ('a', 1), ('b', 2) ]
>>> c = Counter()
>>> dict.update(c, data)
>>> c
Counter({'b': 2, 'a': 1})

      



Finally, there is nothing wrong with your original solution:

Counter(dict(list_of_pairs))

      

An expensive part of creating dictionaries or counters is hashing all keys and resizing them periodically. Once the dictionary is done, converting it to a counter is very cheap, about as fast as dict.copy (). The hash values ​​are reused and the final hash hash table is predefined (no need to resize).

+8


source


From the docs :

Items are counted from iteration or initialized from another mapping (or counter)

So it is not, you need to convert it to a mapping and then initialize it Counter

. And yes, when you initialized with dict

, it was the right move.

UPDATE

I agree that @RaymondHettinger's code looks good and is actually faster

from collections import Counter
from random import choice
from string import ascii_letters
a=[(choice(ascii_letters), i) for i in range(100)]

      

Tested with Python 3.6.1 and IPython 6

Initialization with dict

:



%%timeit
c1=Counter(dict(a))

      

Output

12.1 Β΅s Β± 342 ns per loop (mean Β± std. dev. of 7 runs, 100000 loops each)

      

Updating with dict.update()

%%timeit    
c2=Counter()
dict.update(c2, a)

      

Output:

7.21 Β΅s Β± 236 ns per loop (mean Β± std. dev. of 7 runs, 100000 loops each)

      

+1


source


If your list of keys in pairs is (key, value)

already unique - no duplicates - you can use Raymond Hettinger's excellent solution .

Beware if you only get the last value for any given key, if there are duplicate keys:

>>> data=[ ('a', 1), ('b', 2), ('a', 3), ('b', 4) ]
>>> c=Counter()
>>> dict.update(c, data)
>>> c
Counter({'b': 4, 'a': 3})      # note 'a' and 'b' are only the last value...

      

Same with dict

:

>>> Counter(dict(data))
Counter({'b': 4, 'a': 3})

      

But Counters are most often used to count totals, including duplicates. If you want the sum of the records "a" and "b", you need to iterate over all the pairs:

>>> c=Counter()
>>> for k, v in data:
...    c[k]+=v
... 
>>> c
Counter({'b': 6, 'a': 4})        # the sum of the 'k' entries given 'v'

      

0


source







All Articles