How to do a rolling sum, each line must include the sum of the previous lines
I have a [visit] table. I need to get rows grouped by user_id with the sum of visit_duration_seconds when order_number is null, for example for user [2875636] I get: 61 + 151 + 33 + 13. Each row should contain the sum of the rows before it.
Plz also reference the RESULT column in the expected result below
user_id starttime visit_duration_seconds order_number
2875636 2013-01-16 18:03:50 61
2875636 2013-01-16 18:08:18 151
2875636 2013-01-16 18:15:43 33
2875636 2013-01-16 18:16:37 13
2875636 2013-01-16 18:18:01 2011 10177888
2875636 2013-01-16 18:24:35 1172 10177884
2875636 2013-01-16 18:32:03 4731
2875636 2013-01-16 18:33:27 407
2875636 2013-01-16 18:37:29 74
2875636 2013-01-16 18:48:55 80
2875636 2013-01-16 19:05:00 1955
2875636 2013-01-16 19:14:12 326
2875636 2013-01-16 19:23:39 972
2875636 2013-01-16 19:33:05 5440
2875636 2013-01-16 19:35:48 43
2875636 2013-01-16 19:41:10 66
2875636 2013-01-16 19:42:03 100
2875636 2013-01-16 19:42:12 2414 10177940
2875636 2013-01-16 19:49:05 432 10177925
2875636 2013-01-16 19:50:19 183
2875636 2013-01-16 19:52:46 2061
2875636 2013-01-16 19:52:53 400
2875636 2013-01-16 20:00:47 338
2875636 2013-01-16 20:08:58 216
2875636 2013-01-16 20:14:21 58
2875636 2013-01-16 20:14:26 196
2875636 2013-01-16 20:19:14 2189
2875636 2013-01-16 20:21:29 424
2875636 2013-01-16 20:24:42 999
2875636 2013-01-16 21:01:39 1810
2875636 2013-01-16 21:02:54 525
2875636 2013-01-16 21:10:06 27
2875636 2013-01-16 21:12:08 282
2875636 2013-01-16 21:51:02 6
2875636 2013-01-16 22:18:34 173
2875636 2013-01-16 23:02:58 318
2875636 2013-01-16 23:45:37 207
3018868 2013-01-16 16:01:45 18
3018868 2013-01-16 16:16:45 39
3018868 2013-01-16 16:22:55 656
3018868 2013-01-16 16:25:54 1852
3018868 2013-01-16 16:29:23 688
3018868 2013-01-16 16:47:26 2258 10177846
3018868 2013-01-16 16:57:41 572
3018868 2013-01-16 17:06:47 1431
3018868 2013-01-16 17:18:32 29
3018868 2013-01-16 17:21:57 45
3018868 2013-01-16 17:29:23 16
3018868 2013-01-16 17:36:47 490
EXPECTED RESULT
user_id starttime visit_duration_seconds order_number RESULT
2875636 2013-01-16 18:03:50 61 61
2875636 2013-01-16 18:08:18 151 212
2875636 2013-01-16 18:15:43 33 245
2875636 2013-01-16 18:16:37 13 258
2875636 2013-01-16 18:18:01 2011 10177888 0
2875636 2013-01-16 18:24:35 1172 10177884 0
2875636 2013-01-16 18:32:03 4731 4731
2875636 2013-01-16 18:33:27 407 5138
2875636 2013-01-16 18:37:29 74 5212
2875636 2013-01-16 18:48:55 80 ...
2875636 2013-01-16 19:05:00 1955 ...
2875636 2013-01-16 19:14:12 326 ...
2875636 2013-01-16 19:23:39 972
2875636 2013-01-16 19:33:05 5440
2875636 2013-01-16 19:35:48 43
2875636 2013-01-16 19:41:10 66
2875636 2013-01-16 19:42:03 100
2875636 2013-01-16 19:42:12 2414 10177940
2875636 2013-01-16 19:49:05 432 10177925
2875636 2013-01-16 19:50:19 183
2875636 2013-01-16 19:52:46 2061
2875636 2013-01-16 19:52:53 400
2875636 2013-01-16 20:00:47 338
2875636 2013-01-16 20:08:58 216
2875636 2013-01-16 20:14:21 58
2875636 2013-01-16 20:14:26 196
2875636 2013-01-16 20:19:14 2189
2875636 2013-01-16 20:21:29 424
2875636 2013-01-16 20:24:42 999
2875636 2013-01-16 21:01:39 1810
2875636 2013-01-16 21:02:54 525
2875636 2013-01-16 21:10:06 27
2875636 2013-01-16 21:12:08 282
2875636 2013-01-16 21:51:02 6
2875636 2013-01-16 22:18:34 173
2875636 2013-01-16 23:02:58 318
2875636 2013-01-16 23:45:37 207
3018868 2013-01-16 16:01:45 18
3018868 2013-01-16 16:16:45 39
3018868 2013-01-16 16:22:55 656
3018868 2013-01-16 16:25:54 1852
3018868 2013-01-16 16:29:23 688
3018868 2013-01-16 16:47:26 2258 10177846
3018868 2013-01-16 16:57:41 572
3018868 2013-01-16 17:06:47 1431
3018868 2013-01-16 17:18:32 29
3018868 2013-01-16 17:21:57 45
3018868 2013-01-16 17:29:23 16
3018868 2013-01-16 17:36:47 490
source to share
You can use MySQL user variables to emulate analytic functions. (There are other approaches, such as using a semi-join or using a correlated subquery. I can also suggest solutions for these if you think they might be more appropriate.)
To emulate the analyzed "running total" function, try something like this:
SELECT t.user_id
, t.starttime
, t.order_number
, IF(t.order_number IS NOT NULL,
@tot_dur := 0,
@tot_dur := @tot_dur + t.visit_duration_seconds) AS tot_dur
FROM visit t
JOIN (SELECT @tot_dur := 0) d
ORDER BY t.user_id, t.start_time
The "trick" here is to use the IF function to test if t21 is zero. When it is zero, we add the duration value to the variable, otherwise we set the variable to zero.
We use an inline view (with an alias d
) to initialize the @tot_dur variable to zero.
NOTE. Be careful when using MySQL user variables like this. In a SELECT statement as above, the assignment of variables in the SELECT list occurs after the ORDER BY, so we can get deterministic behavior.
This request does not handle "breaks" in user_id. To get this, we need the user_id value from the previous line. We can store this in another custom variable. The order of operations is deterministic, and we need to take care of doing the accumulation before we overwrite the user_id from the previous line.
We need to either reorder the columns so that the user_id appears after tot_dur (or include a second copy of the user_id column)
SELECT t.user_id
, t.starttime
, t.order_number
, IF(t.order_number IS NULL,
@tot_dur := IF(@prev_user_id = t.user_id,@tot_dur,0) + t.visit_duration_seconds,
@tot_dur := 0
) AS tot_dur
, @prev_user_id := t.user_id AS prev_user_id
FROM visit t
JOIN (SELECT @tot_dur := 0, @prev_user_id := NULL) d
ORDER BY t.user_id, t.start_time
The values ββreturned in the user_id
and columns prev_user_id
are identical. The "extra" column can be dropped, or the columns can be reordered by moving the query (as an inline view) to another query, although this comes at the expense of performance:
SELECT v.user_id
, v.starttime
, v.order_number
, v.tot_dur
FROM (SELECT t.starttime
, t.order_number
, IF(t.order_number IS NULL,
@tot_dur := IF(@prev_user_id = t.user_id,@tot_dur,0) + t.visit_duration_seconds,
@tot_dur := 0
) AS tot_dur
, @prev_user_id := t.user_id AS user_id
FROM visit t
JOIN (SELECT @tot_dur := 0, @prev_user_id := NULL) d
ORDER BY t.user_id, t.start_time
) v
This query shows that MySQL can return the specified result set. But for optimal performance, we would only want to run the query in an inline view (with an alias v
) and handle the column reordering (put the user_id column first) on the client side when the rows are fetched.
Two other general approaches use semi-join and use a correlated subquery, although these approaches can be more intensive when processing large sets.
source to share