Python FInd Largest count in last k elements
Given an array of integers and an integer K value, my task is to write a function that prints to standard output the largest number for that value in the array and the past K entries before it.
Input example:
tps: 6, 9, 4, 7, 4, 1
k: 3
Result:
6 9 9 9 7 7
I was told that the code I wrote could be significantly more efficient for large datasets. How can I make this code as efficient as possible?
def tweets_per_second(tps, k):
past = [tps[0]]
for t in tps[1:]:
past.append(t)
if len(past) > k: past = past[-k:]
print max(past)
source to share
You can achieve linear time complexity using a monotonic queue (O (n) for any k). The idea is this:
-
Let the deck of pairs (value, position) be supported. It is initially empty.
-
When a new element arrives, do this: while the position of the front element is out of range (less than I - K), place it. Although the value of the inverse is less than the new one, place it. Finally, press a pair (current item, its position) on the back of the deck.
-
The answer for the current position is the front deque element.
Each item is added to the deque only once and removed at most once. So the time complexity is linear and independent of K. This solution is optimal, since just reading the input is O (n).
source to share
Try using heap to reduce the complexity of the maximum operation in the range from O(K)
to O(logK)
.
- Add
(-tps[i])
* first ,i in range(0,k)
and output(-heap[0])
each time - for the following Nk numbers you have to add to the heap
tps[i]
removetps[i-k]
and print(-heap[0])
All in all, you end up with an O (N log (K)) algorithm, whereas you are using O (N * K). This will be very useful if K is not small.
* Since the implementation of the heap is min (bunch) in a heap [0] as an invariant, if you add -value
, -heap[0]
will be max(heap)
how you want it.
source to share
pandas can do it pretty well:
import pandas as pd
df = pd.DataFrame(dict(data=[6, 9, 4, 7, 4, 1]))
df['running_max'] = pd.expanding_max(df.data)
df['rolling_max'] = pd.rolling_max(df.data, 3, min_periods=0)
print df
data running_max rolling_max
0 6 6 6
1 9 9 9
2 4 9 9
3 7 9 9
4 4 9 7
5 1 9 7
source to share