Find records that have related records in the past
Basically, the result I am trying to do is "get the number of successful entries that have 0 failed entries in a given amount of time in the past." "success" and "failure" simply refer to the column value.
While it's a little more complicated, here's a description of the table I'm dealing with:
`log`
id int PRIMARY KEY AUTO_INCREMENT
fingerprint_id int (foreign key)
status boolean
date timestamp
The workflow of the little system we have is that when the user is fingerprinted, a record is added to this table and status
set based on whether it matches (again, there is more, I'm just trying to simplify). We get fingerprint_id
based on what the user is doing, so this is the identifier for linking records to a person.
We now require them to try it no more than 3 times. Thus, they can match 1 of 3, 2 of 3, 3 of 3, or not at all. And that means they can have 1, 2, or 3 entries in their "group". While this is not the case, we can assume that the user will keep trying until they match or reach 3 failed attempts (we found that sometimes people don't continue after failing once or twice, perhaps).
Here's an example of some data:
id fp_id status date
----------------------------------------
20 2 0 '2013-01-21 12:30:01'
21 2 0 '2013-01-21 12:30:05'
22 2 0 '2013-01-21 12:30:10'
23 9 1 '2013-01-21 12:31:30'
24 1 0 '2013-01-21 12:35:00'
25 1 1 '2013-01-21 12:35:05'
In the data, user ( fingerprint_id
) 2 tried 3 times and never matched. User 9 matches the first attempt. User 1 tried once and failed, then tried again and picked up.
The challenge is to find out how many successful ( status
= 1) log entries have 0 failed ( status
= 0) entries within 35 seconds ago. Of course, the only way to "connect" them is fingerprint_id
.
Again, we accept many things, but that's fine.
Here's my attempts:
SELECT COUNT(*)
FROM log AS log_main
WHERE log_main.status=1 AND
(SELECT COUNT(*)
FROM log AS log_inner
WHERE log_inner.fingerprint_id=log_main.fingerprint_id AND
log_inner.status=0 AND
log_inner.date<log_main.date AND log_inner.date>=(log_main.date - INTERVAL 35 SECOND))=0
^ I would expect this to display all successful entries that have a score of 0 unsuccessful entries that happened 35 seconds ago (for this user). But I don't know because the request takes over 600 seconds. I just found out how to extend the MySQL Workbench maximum timeout, but it takes a very long time anyway. There are about 120,000 records in the table, so I'm not sure if that's enough to make this query slow down.
Anyway, here's another try:
SELECT COUNT(*)
FROM (SELECT log.fingerprint_id, log.date
FROM log
WHERE log.status=1) successful,
(SELECT log.fingerprint_id, log.date
FROM log
WHERE log.status=0) unsuccessful
WHERE successful.fingerprint_id=unsuccessful.fingerprint_id AND
unsuccessful.date<successful.date AND unsuccessful.date>=(successful.date - INTERVAL 35 SECOND)
^ I feel like this one is closer, but of course there is no comparison of "counting" the number of records matched in the past. This is the part I am confused about on how to solve it. I have a feeling it has to do with GROUP BY
or instead use IN
, but what I did just doesn't work (in the sense that it takes over 600 seconds or something). Here is an example of what I have tried withGROUP BY
SELECT successful.id, COUNT(*) cnt
FROM (SELECT log.fingerprint_id, log.date, log.id
FROM log
WHERE log.status=1) successful,
(SELECT log.fingerprint_id, log.date, log.id
FROM log
WHERE log.status=0) unsuccessful
WHERE successful.fingerprint_id=unsuccessful.fingerprint_id AND
unsuccessful.date<successful.date AND unsuccessful.date>=(successful.date - INTERVAL 35 SECOND)
GROUP BY successful.id
^ But the results only contain strings that are NOT 0. And I assume it is because of the suggestion WHERE
. But I ONLY need a score of 0.
I've tried so many combinations, I think my brain is just fried.
source to share
Try using NOT EXISTS
instead COUNT = 0
. This should work much better.
SELECT COUNT(*)
FROM log AS log_main
WHERE log_main.status=1
AND NOT EXISTS
( SELECT 1
FROM log AS log_inner
WHERE log_inner.fingerprint_id=log_main.fingerprint_id
AND log_inner.status = 0
AND log_inner.date < log_main.date
AND log_inner.date >= (log_main.date - INTERVAL 35 SECOND)
);
You also need to make sure the table is indexed correctly.
EDIT
I find that using is LEFT JOIN/IS NULL
more efficient in MySQL than using NOT EXISTS
, so this will work better than the above (although perhaps not significantly):
SELECT COUNT(*)
FROM log AS log_main
LEFT JOIN log AS log_inner
ON log_inner.fingerprint_id=log_main.fingerprint_id
AND log_inner.status = 0
AND log_inner.date < log_main.date
AND log_inner.date >= (log_main.date - INTERVAL 35 SECOND)
WHERE log_main.status = 1
AND Log_inner.fingerprint_id IS NULL;
EDIT 2
To get records with 1 or 2 attempts, etc., I will still use JOIN, but like this:
SELECT COUNT(*)
FROM ( SELECT log_Main.id
FROM log AS log_main
INNER JOIN log AS log_inner
ON log_inner.fingerprint_id=log_main.fingerprint_id
AND log_inner.status = 0
AND log_inner.date < log_main.date
AND log_inner.date >= (log_main.date - INTERVAL 35 SECOND)
WHERE log_main.status = 1
AND Log_inner.fingerprint_id IS NULL
GROUP BY log_Main.id
HAVING COUNT(log_Inner.id) = 1
) d
source to share