Creating a list of lists from a frequency dictionary in Python

I need help finding a shortcut to create a frequency sorted list of lists from a frequency dictionary. I can create a list of lists (see below) by adding each item to the list and then adding each list to a "list of lists" (simple with only 1-3), but what happens if I have frequencies up to 100 or more ?! There must be a better way.

dictionary = {'ab':2, 'bc':3, 'cd':1, 'de':1, 'ef':3, 'fg':1, 'gh':2}
list_1 = []
list_2 = []
list_3 = []
list_of_lists = []

for key, value in dictionary.items():
    if value == 1:
            list_1.append(key)
for key, value in dictionary.items():
    if value == 2:
            list_2.append(key)
for key, value in dictionary.items():
    if value == 3:
            list_3.append(key)

list_of_lists.append(list_1)
list_of_lists.append(list_2)
list_of_lists.append(list_3)

print list_of_lists

      

a copy of the run in Python looks like this:

[['de', 'cd', 'fg'], ['ab', 'gh'], ['ef', 'bc']]

This is exactly what I want, but it will not work for a 100,000 word corpus at 100+. Please help me find a better, less tedious way to compile a list of lists.

+3


source to share


6 answers





solution 1 - reverse matching through a list of lists (which was suggested)

You are looking for something like a histogram but the opposite.

def inverseHistogram(valueFreqPairs):
    maxFreq = max(p[1] for p in valueFreqPairs)+1
    R = [[] for _ in range(maxFreq)]
    for value,freq in valueFreqPairs:
        R[freq] += [value]
    return R

      

Demo:

>>> inverseHistogram(dictionary.items())
[[], ['de', 'cd', 'fg'], ['ab', 'gh'], ['ef', 'bc']]

      




solution 2 - Reverse matching via defaultdict pattern (much cleaner)

Better yet, if you're happy with using a dictionary to organize the reverse (which seems more elegant). This is how I personally did it.

reverseDict = collections.defaultdict(list)
for value,freq in dictionary.items():
    reverseDict[freq].append(value)

      

Demo:

>>> dict(reverseDict)
{1: ['de', 'cd', 'fg'], 2: ['ab', 'gh'], 3: ['ef', 'bc']}

      

sidenote: this will also save you space if your frequencies are sparse, for example. if your input was {'onlyitem':999999999}

, then you avoid creating a list larger than your memory, thereby blocking your machine.

+1


source


Best way: throw them all in a dict

result = {}

for key, value in dictionary.iteritems():
  if not value in result:
    result[value] = []
  result[value].append(key)

      

A bit easier:



from collections import defaultdict
result = defaultdict(list)

for key, value in dictionary.iteritems():
  result[value].append(key)

      

Or create a list:

result = [[]] * max(dictionary.values())

for key, value in dictionary.iteritems():
  result[value-1].append(key)

      

0


source


dict_of_lists = {}

for key, value in dictionary.items():
    if value in dict_of_lists:
        dict_of_lists[value].append(key)
    else:
        dict_of_lists[value] = [key]

list_of_lists = dict_of_lists.values()

      

0


source


You could do something like this:

dictionary = {'a1':2, ..., 'g':100}
MAX_FREQUENCE = max([dictionary[k] for k in dictionary]) //find the max frequency
list_of_lists=[[] for x in range(MAX_FREQUENCE] //generate empty list of lists
for k in dictionary:  
    dictionary[d[k]-1].append(k)

      

-1

since list_of_lists starts at 0. Line of list on the fly: [f(x) for x in iterable]

called list comprehension .

0


source


You can just use the default dictionary to store your data:

import collections

dictionary={'ab':2, 'bc':3, 'cd':1, 'de':1, 'ef':3, 'fg':1, 'gh':2}
lists_by_frequency=collections.defaultdict(list)
for s, f in dictionary.iteritems():
        lists_by_frequency[f].append(s)
list_of_lists=[[] for i in xrange(max(lists_by_frequency)+1)]
for f, v in lists_by_frequency.iteritems():
        list_of_lists[f]=v
print lists_by_frequency
print list_of_lists

      

Output:

defaultdict(<type 'list'>, {1: ['de', 'cd', 'fg'], 2: ['ab', 'gh'], 3: ['ef', 'bc']})
[[], ['de', 'cd', 'fg'], ['ab', 'gh'], ['ef', 'bc']]

      

As you can see, each group is stored in an index of their frequency. If there is at least one frequency, you can simply subtract it from the final result so you don't end up with an empty zero-offset list.

0


source


Functional way:

import collections

dictionary = {'ab':2, 'bc':3, 'cd':1, 'de':1, 'ef':3, 'fg':1, 'gh':2}

ldict = collections.defaultdict(list)
map(lambda (k, v): ldict[v].append(k), dictionary.iteritems())
list_of_lists = map(lambda x: ldict[x], xrange(0, max(ldict)+1))

print(list_of_lists)

      

This solution uses the same methodology as the solution from hochl. It's functional: so it's shorter, but it takes more time to understand it. :-)

Comment: This is "long" because IMHO the dict / defaultdict constructor (for this use) is too limited.

0


source







All Articles