Weekly active users for each day from the journal

I was wondering if anyone could help me with some SQL to return the number of unique users logged to the database table for two or more days (let's say 7 days as a reference).

My log table has a timestamp (ts) and user_id on each row representing the activity from that user at that time.

The following query returns the Daily Log Users or DAU from that log:

SELECT FLOOR(ts / 86400) AS day, COUNT(DISTINCT user_id) AS dau
FROM log
GROUP BY day ORDER BY day ASC

      

Now let me say that I would like to add to this single request (or at least restore the most efficient way) Weekly active users or total unique users registered for 7 days. However, I don't want to divide my time into non-overlapping weeks. What I need is to count, for each day, the different user_users visible for that day and the 6 previous days.

For example:

day users wau
1   1,2   2
4   1,3   3
7   3,4,5 5
8   5     4    (user_id 2 lost from count)
15  2     2    (user_ids 1,3,4 lost from count)

      

Thanks for any help you can provide, and feel free to ask via comment if you need to clarify.

0


source to share


3 answers


To get the "Weekly Average User" counter (in my understanding of your spec ... "for each day, the number of individual user_users seen during this day and the previous six days"), a row query one of these can be used. (The query also returns the Daily Average User counter.

SELECT d.day
     , COUNT(DISTINCT u.user_id) AS wau
     , COUNT(DISTINCT IF(u.day=d.day,u.user_id,NULL)) AS dau
  FROM ( SELECT FLOOR(k.ts/86400) AS `day`
           FROM `log` k
          GROUP BY `day`
       ) d
  JOIN ( SELECT FLOOR(l.ts/86400) AS `day`
              , l.user_id
           FROM `log` l
          GROUP BY `day`, l.user_id
       ) u
    ON u.day <= d.day
   AND u.day > d.day - 7
 GROUP BY d.day
 ORDER BY d.day

      

(I haven't tested this yet, but later, and I'll update this expression if you need any fixes.)

This query merges a list of users for a given day (from u

rowsource), over several days from a log table ( d

rowsource). Note the alphabetic character "7" that appears in the join predicate (ON clause) and that the retrieval of the user list "matches" the previous 6 days.

Note that this can also be extended to get a separate user count for the last 3 days, for example by adding another expression to the SELECT list.

     , COUNT(DISTINCT IF(u.day<=d.day AND u.day>d.day-3,u.user_id,NULL)) AS 3day

      

That the literal "7" can be increased to get a larger range. And that literal 3 in the expression above can be changed to get a certain number of days ... we just need to make sure that we have enough lines from the previous day (from d

) concatenated with each line from u

.



ORDER OF EXECUTION: Because of the embedded views (or views as MySQL calls them), this query may not be very fast because the results for these embedded views must be implemented in MyISAM staging tables.

Alias ​​inline view u

may not be optimal; it might be faster to join the log table. I was thinking about getting a unique list of users for a given day and this is what got this request in the inline view. It was easier for me to understand what was going on. And I thought that if you had hundreds of the same users entered throughout the day, the inline view would rip out a whole bunch of duplicates before we joined other days. A WHERE clause to limit the number of days we return would be best added inside inline views u

and d

. (In string representation d

, an additional earlier 6 days must be added.)


On the other hand, if the ts column is a TIMESTAMP datatype, I would be more inclined to use an expression DATE(ts)

to extract the date part. But this will return the datatype DATE in the result set, not an integer which will be different from the result set you specified.)

SELECT d.day
     , COUNT(DISTINCT u.user_id) AS wau
     , COUNT(DISTINCT IF(u.day=d.day,u.user_id,NULL)) AS dau
  FROM ( SELECT DATE(k.ts) AS `day`
           FROM `log` k
          GROUP BY `day`
       ) d
  JOIN ( SELECT DATE(l.ts) AS `day`
              , l.user_id
           FROM `log` l
          GROUP BY `day`, l.user_id
       ) u
    ON u.day <= d.day
   AND u.day > DATE_ADD(d.day, INTERVAL -7 DAY)
 GROUP BY d.day
 ORDER BY d.day

      


+3


source


Here is another great example of why one should use date, date or time field types to represent time values ​​in the database, rather than unix timestamps. Invariably, someone wants to actually query the field and then you have to do a bunch of timestamp conversions, since integer timestamp values ​​don't have the inherent concept of time periods and you need to query data based on time periods. In the process, you lose any ability to use indexes on the fields.

Anyway, this is a rather complex query you want to make. Maybe a better way than what I am suggesting, but hopefully what I am suggesting at least makes sense. In this approach, you have to do a cartesian join by joining the table to yourself. Then you limit the number of records using a condition ON

to ensure that the dates in the second log table are within the seven day period of the date in the first log table. Finally, you do the aggregation and grouping. The request might look like this:

SELECT DATE(FROM_UNIXTIME(log1.ts)) as `day`, COUNT(DISTINCT log2.user_id) as `dau`
FROM log AS log1
INNER JOIN log AS log2
ON DATE(FROM_UNIXTIME(log2.ts)) <= DATE(FROM_UNIXTIME(log1.ts))
AND DATE(FROM_UNIXTIME(log2.ts)) >= DATE_SUB(DATE(FROM_UNIXTIME(log1.ts)), INTERVAL 7 DAY)
GROUP BY `day`
ORDER BY `day` ASC

      

Warning. If you have a decent enough number of log entries, this query will take a long time since you will be multiplying the number of entries in the result set by some factor and you will not be using indexes.



Your best bet is to create an actual date format column in your table and run an update to populate the value. Make sure you have an index on this field. Then your request might look like this:

SELECT log1.date_field as `day`, COUNT(DISTINCT log2.date_field) as `dau`
FROM log AS log1
INNER JOIN log AS log2 
ON log2.date_field <= log1.date_field
AND log2.date_field >= DATE_SUB(log1.date_field, INTERVAL 7 DAY)
GROUP BY `day`
ORDER BY `day` ASC

      

Then you can fill in this field for all log entries.

+2


source


It's simple and straightforward for users who are active all week long:

select yearweek (ts) as yearwk, user_id, count (user_id) as weeklyactiveusers from log group 1,2 with count (user_id) = 7;

0


source







All Articles