Cohort Analysis with Amazon Redshift / PostgresSQL

Question

Cohort Analysis with Amazon Redshift / PostgresSQL

I am trying to analyze user retention using cohort analysis based on event data stored in Redshift.

For example, in Redshift I have:

timestamp          action        user id
---------          ------        -------
2015-05-05 12:00   homepage      1
2015-05-05 12:01   product page  1
2015-05-05 12:02   homepage      2
2015-05-05 12:03   checkout      1

I would like to extract the daily cohort. For example:

signup_day  users_count d1  d2  d3  d4  d5  d6  d7 
----------  ----------- --  --  --  --  --  --  --  
2015-05-05  100         80  60  40  20  17  16  12
2015-05-06  150         120 90  60  30  22  18  15

Where signup_day

is the first date, we have a record of user actions, users_count

- the total number of users who have signed up for signup_day

, d1

- the number of users who have performed an action. day after signup_day

, etc.

Is there a better way to represent storage analysis data?

What would be the best query to achieve this with Amazon Redshift? Can I do it with one request?

+3

sql metrics amazon-web-services amazon-redshift

Ita 07 June 15 at 20:31

source to share

1 answer

Ita · Answer 1 · 2015-06-17T14:12:03+0000

I eventually found the request below to satisfy my requirements.

WITH 

users AS (
  SELECT
    user_id,
    date_trunc('day', min(timestamp)) as activated_at
    from table
    group by 1
  )
,

events AS (
  SELECT user_id,
         action,
         timestamp AS occurred_at
    FROM table
)

SELECT DATE_TRUNC('day',u.activated_at) AS signup_date,


       TRUNC(EXTRACT('EPOCH' FROM e.occurred_at - u.activated_At)/(3600*24)) AS user_period,


       COUNT(DISTINCT e.user_id) AS retained_users
  FROM users u
  JOIN events e
    ON e.user_id = u.user_id
   AND e.occurred_at >= u.activated_at
 WHERE u.activated_at >= getdate() - INTERVAL '11 day'
 GROUP BY 1,2
 ORDER BY 1,2

It produces a slightly different table than I described above (but better for my needs):

signup_date  user_period  retained_users
-----------  -----------  --------------
2015-05-05   0            80
2015-05-05   1            60
2015-05-05   2            40
2015-05-05   3            20
2015-05-06   0            100
2015-05-06   1            80
2015-05-06   2            40
2015-05-06   3            20

Cohort Analysis with Amazon Redshift / PostgresSQL

More articles: