How do I create a weekly cohort analysis table using mysql?
Let's say you have a users table that has at least the user's registration date and ID.
Now let's say you have a separate table that tracks an action, such as a payment, that can happen at any point in the user's life. (Let's say like an in-app purchase.) In this table, we keep track of the User ID, Payment Date, and Payment ID.
So, we have something similar to set up our schema:
CREATE TABLE users (
UserId INT,
AddedDate DATETIME
);
CREATE TABLE payments (
PaymentId INT,
UserId INT,
PaymentDate Datetime
);
Now you need a table showing weekly cohorts. A table that looks something like this:
Week size w1 w2 w3 w4 w5 w6 w7
2017-08-28 1 0 0 0 1 0 0 0
2017-09-04 3 1 0 2 0 1 1 2
2017-09-11 2 0 0 1 0 0 0 1
2017-09-18 6 3 1 4 3 1 1 2
2017-09-25 2 1 1 1 0 1 2 0
2017-10-02 7 5 2 3 4 3 1 0
2017-10-09 7 4 5 1 2 5 0 0
2017-10-16 2 1 2 1 1 0 0 0
2017-10-23 7 5 4 4 3 0 0 0
2017-10-30 8 8 7 0 0 0 0 0
2017-11-06 5 5 2 0 0 0 0 0
So the first column has a week, the second is the number of people who signed up this week. Let's say we are looking at week 2017-09-18. 6 people were registered this week. The 3 below the w1 column means that 3 of that 6 people made a purchase in the week they signed up. 1 under w2 means that 1 out of that 6 people made a purchase in the second week they were registered, and so on.
What query would I use to get a table that looks like this?
source to share
This query changes from the one I wrote here: Cohort Analysis in SQL
Here's the last request:
SELECT
STR_TO_DATE(CONCAT(tb.cohort, ' Monday'), '%X-%V %W') as date,
size,
w1,
w2,
w3,
w4,
w5,
w6,
w7
FROM (
SELECT u.cohort,
IFNULL(SUM(s.Offset = 0), 0) w1,
IFNULL(SUM(s.Offset = 1), 0) w2,
IFNULL(SUM(s.Offset = 2), 0) w3,
IFNULL(SUM(s.Offset = 3), 0) w4,
IFNULL(SUM(s.Offset = 4), 0) w5,
IFNULL(SUM(s.Offset = 5), 0) w6,
IFNULL(SUM(s.Offset = 6), 0) w7
FROM (
SELECT
UserId,
DATE_FORMAT(AddedDate, "%Y-%u") AS cohort
FROM users
) as u
LEFT JOIN (
SELECT DISTINCT
payments.UserId,
FLOOR(DATEDIFF(payments.PaymentDate, users.AddedDate)/7) AS Offset
FROM payments
LEFT JOIN users ON (users.UserId = payments.UserId)
) as s ON s.UserId = u.UserId
GROUP BY u.cohort
) as tb
LEFT JOIN (
SELECT DATE_FORMAT(AddedDate, "%Y-%u") dt, COUNT(*) size FROM users GROUP BY dt
) size ON tb.cohort = size.dt
Such a core is that we grab users and their registration date and format the date by number for a week, as we make a weekly cohort.
SELECT
UserId,
DATE_FORMAT(AddedDate, "%Y-%u") AS cohort
FROM users
Since we want to group cohorts, we must put this in a subquery in the FROM part of the query.
Next, we want to associate payment information with users.
SELECT DISTINCT
payments.UserId,
FLOOR(DATEDIFF(payments.PaymentDate, users.AddedDate)/7) AS Offset
FROM payments
LEFT JOIN users ON (users.UserId = payments.UserId)
This will give unique weekly billing events for each user based on the number of weeks they have been a user. We use different ones because if a user made 2 purchases in one week, we don't want to count it as two users.
We are not just using a payment table because some users may register and not have payments. So we select from the users table and join the payment table.
Then you are grouped by week - u.cohort. You then sum by week numbers to see how many people made payments a few weeks after they signed up.
On the mysql version, I used sql_mode set by only_full_group_by. So to get the size of the cohort, I put the main body of the query in a subquery so that I can join users to get the size of the cohort.
Further considerations:
Filtering by week is simple. tb.cohort> start date and tb.cohort <where start and end date are formatted with "% Y-% u". To make the query more efficient, you probably want to filter out payment events that also don't fall within the date range, so you don't join on data you don't need.
You might want to use a calendar table to cover when there are no user subscribers for a week.
Here's a script with everything that works: http://sqlfiddle.com/#!9/172dbe/1
source to share
I tried to delay a month using the same concept
But not sure if my result is correct, but it produces some result, which is ordered in reverse order, which needs to be changed
If any help is deemed complete
Where I have two tables as deviceinfo and session In the deviceinfo table has id (id), device_id (unique deviceid) and created_at (datetime), the session table has id (session table id), deviceid (deviceid which refers to the deviceinfo id) and start_time (datetime)
My request
SELECT u.DayOffset as month,
SUM(s.DayOffset = 0) AS m0,
SUM(s.DayOffset = 1) AS m1,
SUM(s.DayOffset = 2) AS m2,
SUM(s.DayOffset = 3) AS m3,
SUM(s.DayOffset = 4) AS m4,
SUM(s.DayOffset = 5) AS m5,
SUM(s.DayOffset = 6) AS m6
FROM
( SELECT id, month(created_at) AS DayOffset FROM deviceinfo ) as u
LEFT JOIN ( SELECT DISTINCT sessions.deviceid, TIMESTAMPDIFF(Month,sessions.start_time, deviceinfo.created_at)+ 6
AS DayOffset FROM sessions LEFT JOIN deviceinfo ON (deviceinfo.id = sessions.deviceid)) as s
ON s.deviceid = u.id GROUP BY u.DayOffset
My result is like
month m0 m1 m2 m3 m4 m5 m6
9 0 0 0 0 0 3 6
10 0 0 0 0 0 0 6
11 0 0 0 0 0 0 32
But the expected result should be
month m0 m1 m2 m3 m4 m5 m6
9 6 3 0 0 0 0 0
10 6 0 0 0 0 0 0
11 32 0 0 0 0 0 0
Can you say that I am approaching in the correct way?
Thank you in advance
source to share