How to do a rolling sum, each line must include the sum of the previous lines

I have a [visit] table. I need to get rows grouped by user_id with the sum of visit_duration_seconds when order_number is null, for example for user [2875636] I get: 61 + 151 + 33 + 13. Each row should contain the sum of the rows before it.
Plz also reference the RESULT column in the expected result below

user_id   starttime           visit_duration_seconds  order_number
2875636   2013-01-16 18:03:50 61  
2875636   2013-01-16 18:08:18 151 
2875636   2013-01-16 18:15:43 33  
2875636   2013-01-16 18:16:37 13  
2875636   2013-01-16 18:18:01 2011                     10177888
2875636   2013-01-16 18:24:35 1172                     10177884
2875636   2013-01-16 18:32:03 4731    
2875636   2013-01-16 18:33:27 407 
2875636   2013-01-16 18:37:29 74  
2875636   2013-01-16 18:48:55 80  
2875636   2013-01-16 19:05:00 1955    
2875636   2013-01-16 19:14:12 326 
2875636   2013-01-16 19:23:39 972 
2875636   2013-01-16 19:33:05 5440    
2875636   2013-01-16 19:35:48 43  
2875636   2013-01-16 19:41:10 66  
2875636   2013-01-16 19:42:03 100 
2875636   2013-01-16 19:42:12 2414                     10177940
2875636   2013-01-16 19:49:05 432                  10177925
2875636   2013-01-16 19:50:19 183 
2875636   2013-01-16 19:52:46 2061    
2875636   2013-01-16 19:52:53 400 
2875636   2013-01-16 20:00:47 338 
2875636   2013-01-16 20:08:58 216 
2875636   2013-01-16 20:14:21 58  
2875636   2013-01-16 20:14:26 196 
2875636   2013-01-16 20:19:14 2189    
2875636   2013-01-16 20:21:29 424 
2875636   2013-01-16 20:24:42 999 
2875636   2013-01-16 21:01:39 1810    
2875636   2013-01-16 21:02:54 525 
2875636   2013-01-16 21:10:06 27  
2875636   2013-01-16 21:12:08 282 
2875636   2013-01-16 21:51:02 6   
2875636   2013-01-16 22:18:34 173 
2875636   2013-01-16 23:02:58 318 
2875636   2013-01-16 23:45:37 207 
3018868   2013-01-16 16:01:45 18  
3018868   2013-01-16 16:16:45 39  
3018868   2013-01-16 16:22:55 656 
3018868   2013-01-16 16:25:54 1852    
3018868   2013-01-16 16:29:23 688 
3018868   2013-01-16 16:47:26 2258                       10177846
3018868   2013-01-16 16:57:41 572 
3018868   2013-01-16 17:06:47 1431    
3018868   2013-01-16 17:18:32 29  
3018868   2013-01-16 17:21:57 45  
3018868   2013-01-16 17:29:23 16  
3018868   2013-01-16 17:36:47 490

      

EXPECTED RESULT

user_id starttime           visit_duration_seconds  order_number        RESULT
2875636 2013-01-16 18:03:50 61                                      61
2875636 2013-01-16 18:08:18 151                                     212
2875636 2013-01-16 18:15:43 33                                      245
2875636 2013-01-16 18:16:37 13                                      258
2875636 2013-01-16 18:18:01 2011                     10177888           0
2875636 2013-01-16 18:24:35 1172                     10177884           0
2875636 2013-01-16 18:32:03 4731                                        4731
2875636 2013-01-16 18:33:27 407                                     5138
2875636 2013-01-16 18:37:29 74                                      5212
2875636 2013-01-16 18:48:55 80                                      ...
2875636 2013-01-16 19:05:00 1955                                        ...
2875636 2013-01-16 19:14:12 326                                     ...
2875636 2013-01-16 19:23:39 972 
2875636 2013-01-16 19:33:05 5440    
2875636 2013-01-16 19:35:48 43  
2875636 2013-01-16 19:41:10 66  
2875636 2013-01-16 19:42:03 100 
2875636 2013-01-16 19:42:12 2414                     10177940
2875636 2013-01-16 19:49:05 432                  10177925
2875636 2013-01-16 19:50:19 183 
2875636 2013-01-16 19:52:46 2061    
2875636 2013-01-16 19:52:53 400 
2875636 2013-01-16 20:00:47 338 
2875636 2013-01-16 20:08:58 216 
2875636 2013-01-16 20:14:21 58  
2875636 2013-01-16 20:14:26 196 
2875636 2013-01-16 20:19:14 2189    
2875636 2013-01-16 20:21:29 424 
2875636 2013-01-16 20:24:42 999 
2875636 2013-01-16 21:01:39 1810    
2875636 2013-01-16 21:02:54 525 
2875636 2013-01-16 21:10:06 27  
2875636 2013-01-16 21:12:08 282 
2875636 2013-01-16 21:51:02 6   
2875636 2013-01-16 22:18:34 173 
2875636 2013-01-16 23:02:58 318 
2875636 2013-01-16 23:45:37 207 
3018868 2013-01-16 16:01:45 18  
3018868 2013-01-16 16:16:45 39  
3018868 2013-01-16 16:22:55 656 
3018868 2013-01-16 16:25:54 1852    
3018868 2013-01-16 16:29:23 688 
3018868 2013-01-16 16:47:26 2258                       10177846
3018868 2013-01-16 16:57:41 572 
3018868 2013-01-16 17:06:47 1431    
3018868 2013-01-16 17:18:32 29  
3018868 2013-01-16 17:21:57 45  
3018868 2013-01-16 17:29:23 16  
3018868 2013-01-16 17:36:47 490 

      

+3


source to share


1 answer


You can use MySQL user variables to emulate analytic functions. (There are other approaches, such as using a semi-join or using a correlated subquery. I can also suggest solutions for these if you think they might be more appropriate.)

To emulate the analyzed "running total" function, try something like this:

SELECT t.user_id
     , t.starttime
     , t.order_number
     , IF(t.order_number IS NOT NULL,
         @tot_dur := 0,
         @tot_dur := @tot_dur + t.visit_duration_seconds) AS tot_dur
  FROM visit t
  JOIN (SELECT @tot_dur := 0) d
 ORDER BY t.user_id, t.start_time

      

The "trick" here is to use the IF function to test if t21 is zero. When it is zero, we add the duration value to the variable, otherwise we set the variable to zero.

We use an inline view (with an alias d

) to initialize the @tot_dur variable to zero.

NOTE. Be careful when using MySQL user variables like this. In a SELECT statement as above, the assignment of variables in the SELECT list occurs after the ORDER BY, so we can get deterministic behavior.




This request does not handle "breaks" in user_id. To get this, we need the user_id value from the previous line. We can store this in another custom variable. The order of operations is deterministic, and we need to take care of doing the accumulation before we overwrite the user_id from the previous line.

We need to either reorder the columns so that the user_id appears after tot_dur (or include a second copy of the user_id column)

SELECT t.user_id
     , t.starttime
     , t.order_number
     , IF(t.order_number IS NULL,
         @tot_dur := IF(@prev_user_id = t.user_id,@tot_dur,0) + t.visit_duration_seconds,
         @tot_dur := 0
       ) AS tot_dur
     , @prev_user_id := t.user_id AS prev_user_id
  FROM visit t
  JOIN (SELECT @tot_dur := 0, @prev_user_id := NULL) d
 ORDER BY t.user_id, t.start_time

      

The values ​​returned in the user_id

and columns prev_user_id

are identical. The "extra" column can be dropped, or the columns can be reordered by moving the query (as an inline view) to another query, although this comes at the expense of performance:

SELECT v.user_id
     , v.starttime
     , v.order_number
     , v.tot_dur
  FROM (SELECT t.starttime
             , t.order_number
             , IF(t.order_number IS NULL,
                 @tot_dur := IF(@prev_user_id = t.user_id,@tot_dur,0) + t.visit_duration_seconds,
                 @tot_dur := 0
               ) AS tot_dur
             , @prev_user_id := t.user_id AS user_id
          FROM visit t
          JOIN (SELECT @tot_dur := 0, @prev_user_id := NULL) d
         ORDER BY t.user_id, t.start_time
       ) v

      

This query shows that MySQL can return the specified result set. But for optimal performance, we would only want to run the query in an inline view (with an alias v

) and handle the column reordering (put the user_id column first) on the client side when the rows are fetched.

Two other general approaches use semi-join and use a correlated subquery, although these approaches can be more intensive when processing large sets.

+2


source







All Articles