Log parsing in Python: warning if there are more than N errors in a given period?
I have a log file that contains clock data.
I would like to generate an alert if more than N errors are logged during this log in any 5 minute period.
What I dont want to do is specify 5 minute periods (e.g. 00-05, 06-10, etc.) and skip them, because if N = 10 and I have 8 errors in 04 and 8 errors in 07, they will be handled as in two separate buckets and will not generate a warning.
I suppose I could repeat 60 times instead, increasing 1 minute each time and looking at the 5 minute bucket from that point, but is there a more elegant or more efficient way?
source to share
I would use a sliding window (see the Rolling iterator or slide window in Python for a reference) over the list of errors and then check each iteration if the first and last input is within 5 minutes
Example (from Illitor to move or sliding window in Python ):
from collections import deque
def window(seq, n=2):
it = iter(seq)
win = deque((next(it, None) for _ in xrange(n)), maxlen=n)
yield win
append = win.append
for e in it:
append(e)
yield win
for w in window(errors, 10):
# if (w[-1]['timestamp'] - w[0]['timestamp']) > 60*5:
# error
source to share
I decided to take the advice in depperm's comment (and I would like it to be submitted as an answer, not a comment so I can mark it as accepted).
It looks something like this:
error_queue = []
max_errors = 3
for log_line in log_lines:
log_ts = get_timestamp(log_line)
if contains_error(log_line):
error_queue.append(log_ts)
interval_start = log_ts - datetime.timedelta(minutes=5)
try:
threshold = error_queue[-max_errors]
except IndexError:
continue
if threshold and threshold >= interval_start:
raise Exception
source to share