Print a list of unique items in order of their frequency

Let's say we have an array of integers, or even a continuous stream of integers. The idea is to print unique items in descending order based on frequencies of occurrence. For example, for 7, 4, 2, 4, 9, 6, 5, 6, 2, 0, 2, 1, we should get: 2, 4, 6, 7, 9, 5, 0, 1, times, 4 and 6 twice and the rest just once.

Is there a better and more efficient approach than ( sorting map based by value

), counting the events of the elements, storing them in the map, and then sorting the map based on the value.?

+3


source to share


3 answers


However, it seems to me that there must be a very efficient algorithm for this, since there is probably a way to sort the frequencies on the fly.

This problem is actually Omega(nlogn)

in an algebraic tree model (hashing is not allowed in this model), decreasing from Item distinctness problem , so if you were hoping to get one (or constant number) iterations to solve it, without any auxiliary data structure under the hood it is impossible to solve it, since it will allow us to solve the elementality in O (n) in an algebraic tree model.



Canonical solutions for these types of problems:

  • (Your suggestion): Create a map, remove duplicates and sort by frequency.
  • Like 1, but instead of a map - sorting the items and finding the number of times each item is repeated easily with a binary search.
+2


source


If you are using Python use the collections.Counter class and the Counter.most_common () method

from collections import Counter
counted = Counter([7, 4, 2, 4, 9, 6, 5, 6, 2, 0, 2, 1])
ordered = [value for value, count in counted.most_common()]
print(ordered)  # [2, 4, 6, 0, 1, 5, 7, 9]

      



Source available showing heapq is used in most_common ()

0


source


Focusing on the requirement to accept the input as a stream of integers, one solution would be a modified insertion sort ...

Create a list of (here named countlist

) lists, initially empty. Each time you accept a new value, try again through each list in countlist

, looking for the value. If you find a value in countlist[i]

, remove the value from its current list and paste it in countlist[i+1]

, adding a new list in countlist

if necessary. If you never find a value, insert the value into countlist[1]

.

The purpose of repeating downward rather than upward is that it allows you to maintain a pointer to where the value will be inserted in countlist[i]

, if you find it in countlist[i-1]

. If you don't need to sort the values ​​that have the same counter, you can skip that pointer and try again.

I believe this algorithm is O (n 2 ) in general. Processing each new value is O (n) and the results are sorted as they go. At any time, you can print the correct order (for now), iterate downward through countlist

and print each list.

A run example using the example in the question ...

input: 7, 4, 2, 4, 9, 6, 5, 6, 2, 0, 2, 1

After accepting 7:
countlist[1] = 7

After accepting 4:
countlist[1] = 4, 7

After accepting 2:
countlist[1] = 2, 4, 7

After accepting 4:
countlist[1] = 2, 7
countlist[2] = 4

After accepting 9:
countlist[1] = 2, 7, 9
countlist[2] = 4

After accepting 6:
countlist[1] = 2, 6, 7, 9
countlist[2] = 4

After accepting 5:
countlist[1] = 2, 5, 6, 7, 9
countlist[2] = 4

After accepting 6:
countlist[1] = 2, 5, 7, 9
countlist[2] = 4, 6

After accepting 2:
countlist[1] = 5, 7, 9
countlist[2] = 2, 4, 6

After accepting 0:
countlist[1] = 0, 5, 7, 9
countlist[2] = 2, 4, 6

After accepting 2:
countlist[1] = 0, 5, 7, 9
countlist[2] = 4, 6
countlist[3] = 2

After accepting 1:
countlist[1] = 0, 1, 5, 7, 9
countlist[2] = 4, 6
countlist[3] = 2

      

0


source







All Articles