Conditional cumulative average in PostgreSQL

Question

Conditional cumulative average in PostgreSQL

I have a simple table that:

    DROP TABLE IF EXISTS running_averages;

    CREATE TABLE running_averages
    (
            avg_id          SERIAL NOT NULL PRIMARY KEY,
            num1             integer,
            num2             integer   DEFAULT 0     

    );

    INSERT INTO running_averages(num1, num2)
    SELECT 100, 100 UNION ALL
    SELECT 200, 175 UNION ALL
    SELECT -400, NULL UNION ALL
    SELECT 300, 200 UNION ALL
    SELECT -100, NULL;

In the above table, the "num2" column must be updated with the cumulative average of the previous row if the "num1" column is negative. My current request:

    SELECT *,
            num1 * num2 AS current_total,
            SUM(num1 * num2) OVER(order by avg_id) AS cumulative_sum,
            SUM(num1)  OVER(order by avg_id) AS culmulative_num1,

            CASE WHEN num1 > 0 THEN
            SUM(num1 * num2) OVER(order by avg_id) 
            / 
            SUM(num1)  OVER(order by avg_id) 
            ELSE
            0
            END AS cumulative_average
    FROM running_averages;

Result:

avg_id  num1  num2    current_total cumulative_sum   cumulative_num1 cumulative_average
1       100   100     10,000        10,000           100             100
2       200   175     35,000        45,000           300             150
3       -400          NULL          45,00            -100            0
4       300   200     60,000        105,000          200             525
5       -100          NULL          105,000          100               0

I cannot figure out a way to bring the cumulative average from the previous row if the current column num1 of the row is a negative number. Instead of the above, the expected output should be:

avg_id  num1  num2    current_total cumulative_sum   cumulative_num1 cumulative_average
1       100   100     10,000        10,000           100             100
2       200   175     35,000        45,000           300             150
3       -400  150     -60,000       -15,00           -100            150
4       300   200     60,000        45,000           200             225
5       -100  225     -22,500       22,500           100             225

How can I get the column value of the last row in this case?

Edit:

I have edited the SQL Script above. I really like Gordon Linoff's approach . But it sadly produces the wrong result as per the Script change:

avg_id  num1  num2    new_num2
1       100   100     100
2       200   175     175
3       -400  150     150 (Correct)
4       300   200     200
5       -100  225     50  (Incorrect)

Edit 2

I also checked Multisync's answer , it also produces the wrong result:

avg_id  num1  num2              current_total cumulative_sum   cumulative_num1 cumulative_average
1       100   100               10,000        10,000           100             100
2       200   175               35,000        45,000           300             150
3       -400  150 (Correct)     -60,000       -15,00           -100            150
4       300   200               60,000        45,000           200             225
5       -100  175 (Incorrect)   -17,500       27,500           100             275

Edit 3

I accepted Multisync's updated answer as it gives the correct results. I would also like to know how I can improve queries like this when we have many aggregate and window functions. Any link on this topic would be helpful.

+3

sql postgresql

Nancy 16 nov. 14 at 20:18

source to share

2 answers

Focus on this:

In the above table, the "num2" column must be updated with the cumulative average from the previous row if the "num1" column is negative

It shouldn't be too hard:

select ra.*,
       (case when num1 >= 0 then num2
             else avg(num1) over (order by avg_id rows between unbounded preceding and 1 preceding)
        end) as new_num2
from running_averages ra;

I think you can do the rest of the calculations using new_num2

.

+2

Gordon linoff 16 nov. 14 at 22:48

source to share

Multisync · Accepted Answer · 2014-11-16T22:02:39+0000

I can only think of a recursive query:

with recursive tmp (avg_id, num1, num2, sum_m, sum_num1, last_id) as (
  select avg_id, num1, num2, num1 * num2, num1, avg_id
  from running_averages where avg_id = 1  
  union all
  select r.avg_id, r.num1, 
         case when r.num1 < 0 then t.sum_m / t.sum_num1 else r.num2 end, 
         t.sum_m + case when r.num1 < 0 then t.sum_m / t.sum_num1 else r.num2 end * r.num1,
         t.sum_num1 + r.num1,
         r.avg_id 
  from running_averages r join tmp t on r.avg_id = t.last_id + 1
)
select avg_id, num1, num2, 
       num1 * num2 AS current_total,
       SUM(num1 * num2) OVER(order by avg_id) AS cumulative_sum,
       SUM(num1) OVER(order by avg_id) AS culmulative_num1,
       SUM(num1 * num2) OVER(order by avg_id) 
       / SUM(num1) OVER(order by avg_id) AS cumulative_average
from tmp;

avg_id

should contain consequtive numbers (you can use instead row_number()

, I didn't use it for simplicity)

num2

changes during computation, so I can't think of anything else recursive query (the output of the previous step is the input of the next step)

Conditional cumulative average in PostgreSQL

Edit:

Edit 2

Edit 3

More articles: