MySQL counts new VS users daily (cohort analysis)

Table structure: user_id, Date (I use to work with timestamp)

eg

user id | Date (TS)
A       | '2014-08-10 14:02:53' 
A       | '2014-08-12 14:03:25' 
A       | '2014-08-13 14:04:47'
B       | '2014-08-13 04:04:47'
...

      

and next week I have

user id | Date (TS)
A       | '2014-08-17 09:02:53'     
B       | '2014-08-17 10:04:47'
B       | '2014-08-18 10:04:47'
A       | '2014-08-19 10:04:22'
C       | '2014-08-19 11:04:47'
...

      

and today I have

user id | Date (TS)
A       | '2015-05-27 09:02:53'     
B       | '2015-05-27 10:04:47'
C       | '2015-05-27 10:04:22'
D       | '2015-05-27 17:04:47'

      

I need to know how to execute a single query to find the number of users who are "returned" users from the very beginning.

Expected results:

date        | New user | returned User
2014-08-10  |  1       | 0
2014-08-11  |  0       | 0
2014-08-12  |  0       | 1 (A was active on 08/11)
2014-08-13  |  1       | 1 (A was active on 08/12 & 08/11)
...
2014-08-17  |  0       | 2 (A & B were already active )
2014-08-18  |  0       | 1 
2014-08-19  |  1       | 1 
...
2015-05-27  |  1       | 3 (D is a new user) 

      

After doing some lengthy searching on Stackoverflow I found some material provided by https://meta.stackoverflow.com/users/107744/spencer7593 here: Weekly active users for each day from the log , but I was unable to modify its query to output my expected results ...

thanks for the help

+3


source to share


2 answers


Assuming you have a date table somewhere (and using t-sql syntax because I know it's better ...), the key is to calculate the mind for each user separately, calculate the total number of users that day. and then simply declaring the returning user as a user who was not new:



SELECT DateTable.Date, NewUsers, NumUsers - NewUsers AS ReturningUsers
FROM
DateTable
    LEFT JOIN
        (
        SELECT MinDate, COUNT(user_id) AS NewUsers
        FROM (
                SELECT user_id, min(CAST(date AS Date)) as MinDate
                FROM Table
                GROUP BY user_id
            ) A
        GROUP BY MinDate
        ) B ON DateTable.Date = B.MinDate
    LEFT JOIN
        (
        SELECT CAST(date AS Date) AS Date, COUNT(DISTINCT user_id) AS NumUsers
        FROM Table
        GROUP CAST(date AS Date)
        ) C ON DateTable.Date = C.Date

      

+3


source


Thanks to Steven, I made a short fix at his request that works well even if it takes a little time on a large database:



SELECT 
    DATE(Stats.Created),
    NewUsers,
    NumUsers - NewUsers AS ReturningUsers
FROM
    Stats
LEFT JOIN
    (
        SELECT
            MinDate,
            COUNT(user_id) AS NewUsers
        FROM (
            SELECT
                user_id,
                MIN(DATE(Created)) as MinDate
            FROM Stats
            GROUP BY user_id
        ) A
        GROUP BY MinDate
    ) B
ON DATE(Stats.Created) = B.MinDate
LEFT JOIN
    (
        SELECT 
            DATE(Created) AS Date,
            COUNT(DISTINCT user_id) AS NumUsers
        FROM Stats
        GROUP BY DATE(Created)
    ) C
ON DATE(Stats.Created) = C.Date
GROUP BY DATE(Stats.Created)

      

+1


source







All Articles