How do I create a weekly cohort analysis table using mysql?

Let's say you have a users table that has at least the user's registration date and ID.

Now let's say you have a separate table that tracks an action, such as a payment, that can happen at any point in the user's life. (Let's say like an in-app purchase.) In this table, we keep track of the User ID, Payment Date, and Payment ID.

So, we have something similar to set up our schema:

CREATE TABLE users (
  UserId INT,
  AddedDate DATETIME
);

CREATE TABLE payments (
  PaymentId INT,
  UserId INT,
  PaymentDate Datetime
);

      

Now you need a table showing weekly cohorts. A table that looks something like this:

Week       size w1  w2  w3  w4  w5  w6  w7
2017-08-28  1   0   0   0   1   0   0   0
2017-09-04  3   1   0   2   0   1   1   2
2017-09-11  2   0   0   1   0   0   0   1
2017-09-18  6   3   1   4   3   1   1   2
2017-09-25  2   1   1   1   0   1   2   0
2017-10-02  7   5   2   3   4   3   1   0
2017-10-09  7   4   5   1   2   5   0   0
2017-10-16  2   1   2   1   1   0   0   0
2017-10-23  7   5   4   4   3   0   0   0
2017-10-30  8   8   7   0   0   0   0   0
2017-11-06  5   5   2   0   0   0   0   0

      

So the first column has a week, the second is the number of people who signed up this week. Let's say we are looking at week 2017-09-18. 6 people were registered this week. The 3 below the w1 column means that 3 of that 6 people made a purchase in the week they signed up. 1 under w2 means that 1 out of that 6 people made a purchase in the second week they were registered, and so on.

What query would I use to get a table that looks like this?

0


source to share


2 answers


This query changes from the one I wrote here: Cohort Analysis in SQL

Here's the last request:

SELECT
  STR_TO_DATE(CONCAT(tb.cohort, ' Monday'), '%X-%V %W') as date,
  size,
  w1,
  w2,
  w3,
  w4,
  w5,
  w6,
  w7
FROM (
  SELECT u.cohort, 
    IFNULL(SUM(s.Offset = 0), 0) w1,
    IFNULL(SUM(s.Offset = 1), 0) w2,
    IFNULL(SUM(s.Offset = 2), 0) w3,
    IFNULL(SUM(s.Offset = 3), 0) w4,
    IFNULL(SUM(s.Offset = 4), 0) w5,
    IFNULL(SUM(s.Offset = 5), 0) w6,
    IFNULL(SUM(s.Offset = 6), 0) w7
  FROM (
   SELECT
      UserId,
      DATE_FORMAT(AddedDate, "%Y-%u") AS cohort
    FROM users
  ) as u
  LEFT JOIN (
      SELECT DISTINCT
      payments.UserId,
      FLOOR(DATEDIFF(payments.PaymentDate, users.AddedDate)/7) AS Offset
      FROM payments
      LEFT JOIN users ON (users.UserId = payments.UserId)
  ) as s ON s.UserId = u.UserId
  GROUP BY u.cohort
) as tb
LEFT JOIN (
  SELECT DATE_FORMAT(AddedDate, "%Y-%u") dt, COUNT(*) size FROM users GROUP BY dt
) size ON tb.cohort = size.dt

      

Such a core is that we grab users and their registration date and format the date by number for a week, as we make a weekly cohort.

SELECT
  UserId,
  DATE_FORMAT(AddedDate, "%Y-%u") AS cohort
FROM users

      

Since we want to group cohorts, we must put this in a subquery in the FROM part of the query.

Next, we want to associate payment information with users.

SELECT DISTINCT
  payments.UserId,
  FLOOR(DATEDIFF(payments.PaymentDate, users.AddedDate)/7) AS Offset
  FROM payments
  LEFT JOIN users ON (users.UserId = payments.UserId)

      



This will give unique weekly billing events for each user based on the number of weeks they have been a user. We use different ones because if a user made 2 purchases in one week, we don't want to count it as two users.

We are not just using a payment table because some users may register and not have payments. So we select from the users table and join the payment table.

Then you are grouped by week - u.cohort. You then sum by week numbers to see how many people made payments a few weeks after they signed up.

On the mysql version, I used sql_mode set by only_full_group_by. So to get the size of the cohort, I put the main body of the query in a subquery so that I can join users to get the size of the cohort.

Further considerations:

Filtering by week is simple. tb.cohort> start date and tb.cohort <where start and end date are formatted with "% Y-% u". To make the query more efficient, you probably want to filter out payment events that also don't fall within the date range, so you don't join on data you don't need.

You might want to use a calendar table to cover when there are no user subscribers for a week.

Here's a script with everything that works: http://sqlfiddle.com/#!9/172dbe/1

0


source


I tried to delay a month using the same concept

But not sure if my result is correct, but it produces some result, which is ordered in reverse order, which needs to be changed

If any help is deemed complete

Where I have two tables as deviceinfo and session In the deviceinfo table has id (id), device_id (unique deviceid) and created_at (datetime), the session table has id (session table id), deviceid (deviceid which refers to the deviceinfo id) and start_time (datetime)

My request

SELECT u.DayOffset as month,
 SUM(s.DayOffset = 0) AS m0,
 SUM(s.DayOffset = 1) AS m1,
 SUM(s.DayOffset = 2) AS m2,
 SUM(s.DayOffset = 3) AS m3,
 SUM(s.DayOffset = 4) AS m4,
 SUM(s.DayOffset = 5) AS m5,
 SUM(s.DayOffset = 6) AS m6
 FROM
  ( SELECT id, month(created_at) AS DayOffset FROM deviceinfo ) as u
  LEFT JOIN ( SELECT DISTINCT sessions.deviceid, TIMESTAMPDIFF(Month,sessions.start_time, deviceinfo.created_at)+ 6
  AS DayOffset FROM sessions LEFT JOIN deviceinfo ON (deviceinfo.id = sessions.deviceid)) as s
  ON s.deviceid = u.id GROUP BY u.DayOffset

      



My result is like

month      m0  m1  m2  m3  m4  m5  m6  
  9         0   0   0   0   0   3   6   
 10         0   0   0   0   0   0   6   
 11         0   0   0   0   0   0   32  

      

But the expected result should be

 month      m0  m1  m2  m3  m4  m5  m6  
  9         6   3   0   0   0   0   0   
 10         6   0   0   0   0   0   0   
 11         32  0   0   0   0   0   0

      

Can you say that I am approaching in the correct way?

Thank you in advance

0


source







All Articles