How to count the 10 most common values ββin a dict in python
I'm new to python and programming in general and so please be kind. I am trying to parse a csv file with music information and return the top n most listened bands. From the code below, each song listens to a dict recording in a list formatted like this:
[{'album': 'Exile on Main Street', 'song': 'Happy', 'datetime': '3 Dec 2014 14:08', 'artist': 'The Rolling Stones'}, {'album': 'II', 'song': 'Black Dog', 'datetime': '1 Dec 2014 08:08', 'artist': 'Led Zepplin'}]
from collections import Counter
def count_artist_plays(filename):
with open(filename, 'r') as data:
header = data.readline().strip().split(',')
entries = []
for line in data:
entry = line.strip().split(',')
listens = {}
for info, type in enumerate(header):
listens[type] = entry[info]
entries.append(listens)
for d in entries:
arts = d['artist']
c = Counter(arts)
print c.most_common(10)
How can I get the most used string (range) instead of splitting a character? I get the following:
[('s', 2), ('a', 1), (' ', 1), ('E', 1), ('l', 1), ('o', 1), ('n', 1), ('S', 1), ('v', 1), ('y', 1)]
source to share
Initialize the Counter once, let the keys be artists, and increment the key (artist) every time through the loop:
c = Counter()
for d in entries:
arts = d['artist']
c[arts] += 1
print(c.most_common(10))
When arts
is a string then c = Counter(arts)
counts characters in arts
:
In [522]: collections.Counter('Led Zepplin')
Out[522]: Counter({'e': 2, 'p': 2, ' ': 1, 'd': 1, 'i': 1, 'L': 1, 'l': 1, 'n': 1, 'Z': 1})
Compared:
In [523]: c = collections.Counter()
In [524]: c['Led Zepplin'] += 1
In [525]: c['The Rolling Stones'] += 1
In [526]: c.most_common()
Out[526]: [('Led Zepplin', 1), ('The Rolling Stones', 1)]
Alternatively, as John Clements points out, create a list of all artists and then count the list:
c = Counter(d['artist'] for d in entries)
print(c.most_common(10))
Note that the above uses a generator expression to avoid creating a (possibly) large temporary list, and at the same time has a much shorter, more readable syntax.
source to share