Using an aggregate function that resets after a condition is met?

I am working with event data and am currently trying to work out the time spent in the application by summing the difference between the current and previous timestamps. However, the problem is that I need to reset this value every time the value of the "package_name" column changes. I've tried using the following.

SELECT    
    SUM(timeDifference) OVER(PARTITION BY packageName ORDER BY sNumber, timestamp) as accTime,
    *
FROM table.name
ORDER BY
    sNumber, timestamp

      

However, the result seems too intellectual. I need it to forget about this aggregation after each section, instead of remembering the previous results and accumulating them.

My question is if there is a way to reset. I will give examples of what I am getting and what my desired output is. Any help would be much appreciated.

What do I get.

**accTime      diff         packageName**
10              10          com.package.1
20              20          com.package.1
10              10          com.package.2
20              20          com.package.2
30              10          com.package.1

      

What I want.

**accTime      diff         packageName**
10              10          com.package.1
20              20          com.package.1
10              10          com.package.2
20              20          com.package.2
10              10          com.package.1

      

The second example shows that the accumulated time for "first" is getting reset, which I need help with.

To explain even more, here's a sample of the raw data:

**timestamp          packageName          sNumber      eventID      diff**
  1433119125117      com.package.1        xx123xx      event1       null
  1433119125200      com.package.1        xx123xx      event2         83
  1433119125400      com.package.2        xx123xx      event3        200
  1433119125600      com.package.2        xx123xx      event4        200
  1433119125800      com.package.1        xx123xx      event5        200

      

+3


source to share


2 answers


Using the delay feature (you'll notice my answer is similar to Pentium) I THINK this is what you want ...

I'm not 100% sure since your accTime seems to behave strangely from its diff ... to me, accTime should be accTime + diff, no? (if I'm wrong, correct me with where the request is right now, easy to set up :))

SELECT
  timestamp,package,sNumber,eventID,diff,
  CASE WHEN lagPackage IS NULL then 0
  WHEN package != lagPackage THEN diff 
  ELSE (diff + IF(lagDiff is null, 0,lagDiff)) END AS accTime
FROM (
  SELECT
    *,
    LAG(package,1) OVER (ORDER BY timestamp) AS lagPackage,
    LAG(diff,1,0) OVER (ORDER BY timestamp) AS lagDiff
  FROM (
    SELECT
      1433119125117 AS timestamp,
      'com.package.1' AS package,
      'xxx123xxx' AS sNumber,
      'event1' AS eventID,
      NULL AS diff),
    (
    SELECT
      1433119125200 AS timestamp,
      'com.package.1' AS package,
      'xxx123xxx' AS sNumber,
      'event2' AS eventID,
      83 AS diff),
    (
    SELECT
      1433119125400 AS timestamp,
      'com.package.2' AS package,
      'xxx123xxx' AS sNumber,
      'event3' AS eventID,
      200 AS diff),
    (
    SELECT
      1433119125600 AS timestamp,
      'com.package.2' AS package,
      'xxx123xxx' AS sNumber,
      'event4' AS eventID,
      200 AS diff),
    (
    SELECT
      1433119125800 AS timestamp,
      'com.package.1' AS package,
      'xxx123xxx' AS sNumber,
      'event5' AS eventID,
      200 AS diff),
  ORDER BY
    timestamp )

      



From the set of samples you gave, this returns:

Row timestamp       package         sNumber  eventID    diff    accTime  
1   1433119125117   com.package.1   xxx123xxx   event1  null    0    
2   1433119125200   com.package.1   xxx123xxx   event2  83      83   
3   1433119125400   com.package.2   xxx123xxx   event3  200     200  
4   1433119125600   com.package.2   xxx123xxx   event4  200     400  
5   1433119125800   com.package.1   xxx123xxx   event5  200     200  

      

+1


source


In the meantime I played with some sample. This is not a complete answer, but might help someone.

select 
  pos,label,diff,
  if (lag!=label or lag is null,1,0) as reset
from(
  select 
    pos,label,diff,
    LAG(label, 1) OVER (ORDER BY pos asc) lag,
  from (select 10 as diff,'first' as label, 1 as pos),
    (select 20 as diff,'first' as label, 2 as pos),
    (select 10 as diff,'second' as label, 3 as pos),
    (select 20 as diff,'second' as label, 4 as pos),
    (select 10 as diff,'first' as label, 5 as pos),
    (select 11 as diff,'first' as label, 6 as pos),
    (select 12 as diff,'first' as label, 7 as pos),
  order by pos
)

      



this returns

+-----+-----+--------+------+-------+---+
| Row | pos | label  | diff | reset |   |
+-----+-----+--------+------+-------+---+
|   1 |   1 | first  |   10 |     1 |   |
|   2 |   2 | first  |   20 |     0 |   |
|   3 |   3 | second |   10 |     1 |   |
|   4 |   4 | second |   20 |     0 |   |
|   5 |   5 | first  |   10 |     1 |   |
|   6 |   6 | first  |   11 |     0 |   |
|   7 |   7 | first  |   12 |     0 |   |
+-----+-----+--------+------+-------+---+

      

0


source







All Articles