How to connect extensible SQL to find users who log in continuously for n days

If I have a table (Oracle or MySQL) that stores the date user logins.

So how can I write SQL (or whatever) to find users who log in continuously for n days.

For example:

userID  |  logindate
1000       2014-01-10
1000       2014-01-11
1000       2014-02-01
1000       2014-02-02
1001       2014-02-01
1001       2014-02-02
1001       2014-02-03
1001       2014-02-04
1001       2014-02-05
1002       2014-02-01
1002       2014-02-03
1002       2014-02-05
.....

      

We can see that user 1000 constantly connects for two days in 2014 and user 1001 connects continuously for 5 days. and user 1002 will never log in.

The SQL needs to be extensible, which means I can select every number n and change slightly or pass a new parameter and the results are as expected.

Thank!

+3


source to share


3 answers


Since we don't know which dbms you are using (you named MySQL and Oracle), here are two solutions, they both do the same: order the rows and subtract the robel days from the login date (so if the 6th entry is 2014-02- 12, and the 7th is 2014-02-13, both of which lead to 2014-02-06). So we group the user and this group day and count the days. We then group the user to find the longest streak.

Here is a solution for dbms with analytic window functions (like Oracle):

select userid, max(days)
from
(
  select userid, groupday, count(*) as days
  from
  (
    select
      userid, logindate - row_number() over (partition by userid order by logindate) as groupday
    from mytable
  )
  group by userid, groupday
)
group by userid
--having max(days) >= 3

      



And here is a MySQL query (untested because I don't have MySQL):

select
  userid, max(days)
from
(
  select
    userid, date_add(logindate, interval -row_number day) as groupday, count(*) as days
  from
  (
    select 
      userid, logindate,
      @row_num := @row_num + 1 as row_number
    from mytable
    cross join (select @row_num := 0) r
    order by userid, logindate
  )
  group by userid, groupday
)
group by userid
-- having max(days) >= 3

      

+1


source


I think the following query will give you a very extensible parameterization:

select z.userid, count(*) continuous_login_days 
from
( 
  with max_dates as
  ( -- Get max date for every user ID
    select t.userid, max(t.logindate) max_date
    from test t
    group by t.userid
  ),
  ranks as
  ( -- Get ranks for login dates per user
    select t.*,
           row_number() over
               (partition by t.userid order by t.logindate desc) rnk
    from test t
  )
  -- So here, we select continuous days by checking if rank inside group
  -- (per user ID) matches login date compared to max date
  select r.userid, r.logindate, r.rnk, m.max_date
  from ranks r, max_dates m
  where m.userid = r.userid
    and r.logindate + r.rnk - 1 = m.max_date -- here is the key
) z
-- Then we only group by user ID to get the number of continuous days
group by z.userid
;

      

Here's the result:

    USERID  CONTINUOUS_LOGIN_DAYS
1   1000    2
2   1001    5
3   1002    1

      



So, you can just select the query field CONTINUOUS_LOGIN_DAYS

.

EDIT . If you want to select from all ranges (not just the last one), my query structure no longer works because it relied on the last range. But here's a workaround:

with w as
( -- Parameter
  select 2 nb_cont_days from dual
)
select *
from
(
  select t.*,
         -- Get number of days around
         (select count(*) from test t2
          where t2.userid = t.userid
            and t2.logindate between t.logindate - nb_cont_days + 1
                                 and t.logindate) m1,
         -- Get also number of days more in the past, and in the future
         (select count(*) from test t2
          where t2.userid = t.userid
            and t2.logindate between t.logindate - nb_cont_days
                                 and t.logindate + 1) m2,
         w.nb_cont_days
  from w, test t
) x
-- If these 2 fields match, then we have what we want
where x.m1 = x.nb_cont_days
  and x.m2 = x.nb_cont_days
order by 1, 2

      

You just need to change the parameter in the clause WITH

, so you can even create a function from that query to call it using that parameter.

+1


source


SELECT userID,count(userID) as numOfDays FROM LOGINTABLE WHERE logindate between '2014-01-01' AND '2014-02-28'
GROUP BY userID

      

In this case, you can check the login days for each user for a certain period

0


source







All Articles