How to connect extensible SQL to find users who log in continuously for n days
If I have a table (Oracle or MySQL) that stores the date user logins.
So how can I write SQL (or whatever) to find users who log in continuously for n days.
For example:
userID | logindate
1000 2014-01-10
1000 2014-01-11
1000 2014-02-01
1000 2014-02-02
1001 2014-02-01
1001 2014-02-02
1001 2014-02-03
1001 2014-02-04
1001 2014-02-05
1002 2014-02-01
1002 2014-02-03
1002 2014-02-05
.....
We can see that user 1000 constantly connects for two days in 2014 and user 1001 connects continuously for 5 days. and user 1002 will never log in.
The SQL needs to be extensible, which means I can select every number n and change slightly or pass a new parameter and the results are as expected.
Thank!
source to share
Since we don't know which dbms you are using (you named MySQL and Oracle), here are two solutions, they both do the same: order the rows and subtract the robel days from the login date (so if the 6th entry is 2014-02- 12, and the 7th is 2014-02-13, both of which lead to 2014-02-06). So we group the user and this group day and count the days. We then group the user to find the longest streak.
Here is a solution for dbms with analytic window functions (like Oracle):
select userid, max(days)
from
(
select userid, groupday, count(*) as days
from
(
select
userid, logindate - row_number() over (partition by userid order by logindate) as groupday
from mytable
)
group by userid, groupday
)
group by userid
--having max(days) >= 3
And here is a MySQL query (untested because I don't have MySQL):
select
userid, max(days)
from
(
select
userid, date_add(logindate, interval -row_number day) as groupday, count(*) as days
from
(
select
userid, logindate,
@row_num := @row_num + 1 as row_number
from mytable
cross join (select @row_num := 0) r
order by userid, logindate
)
group by userid, groupday
)
group by userid
-- having max(days) >= 3
source to share
I think the following query will give you a very extensible parameterization:
select z.userid, count(*) continuous_login_days
from
(
with max_dates as
( -- Get max date for every user ID
select t.userid, max(t.logindate) max_date
from test t
group by t.userid
),
ranks as
( -- Get ranks for login dates per user
select t.*,
row_number() over
(partition by t.userid order by t.logindate desc) rnk
from test t
)
-- So here, we select continuous days by checking if rank inside group
-- (per user ID) matches login date compared to max date
select r.userid, r.logindate, r.rnk, m.max_date
from ranks r, max_dates m
where m.userid = r.userid
and r.logindate + r.rnk - 1 = m.max_date -- here is the key
) z
-- Then we only group by user ID to get the number of continuous days
group by z.userid
;
Here's the result:
USERID CONTINUOUS_LOGIN_DAYS
1 1000 2
2 1001 5
3 1002 1
So, you can just select the query field CONTINUOUS_LOGIN_DAYS
.
EDIT . If you want to select from all ranges (not just the last one), my query structure no longer works because it relied on the last range. But here's a workaround:
with w as
( -- Parameter
select 2 nb_cont_days from dual
)
select *
from
(
select t.*,
-- Get number of days around
(select count(*) from test t2
where t2.userid = t.userid
and t2.logindate between t.logindate - nb_cont_days + 1
and t.logindate) m1,
-- Get also number of days more in the past, and in the future
(select count(*) from test t2
where t2.userid = t.userid
and t2.logindate between t.logindate - nb_cont_days
and t.logindate + 1) m2,
w.nb_cont_days
from w, test t
) x
-- If these 2 fields match, then we have what we want
where x.m1 = x.nb_cont_days
and x.m2 = x.nb_cont_days
order by 1, 2
You just need to change the parameter in the clause WITH
, so you can even create a function from that query to call it using that parameter.
source to share