Conditional cumulative average in PostgreSQL
I have a simple table that:
DROP TABLE IF EXISTS running_averages;
CREATE TABLE running_averages
(
avg_id SERIAL NOT NULL PRIMARY KEY,
num1 integer,
num2 integer DEFAULT 0
);
INSERT INTO running_averages(num1, num2)
SELECT 100, 100 UNION ALL
SELECT 200, 175 UNION ALL
SELECT -400, NULL UNION ALL
SELECT 300, 200 UNION ALL
SELECT -100, NULL;
In the above table, the "num2" column must be updated with the cumulative average of the previous row if the "num1" column is negative. My current request:
SELECT *,
num1 * num2 AS current_total,
SUM(num1 * num2) OVER(order by avg_id) AS cumulative_sum,
SUM(num1) OVER(order by avg_id) AS culmulative_num1,
CASE WHEN num1 > 0 THEN
SUM(num1 * num2) OVER(order by avg_id)
/
SUM(num1) OVER(order by avg_id)
ELSE
0
END AS cumulative_average
FROM running_averages;
Result:
avg_id num1 num2 current_total cumulative_sum cumulative_num1 cumulative_average
1 100 100 10,000 10,000 100 100
2 200 175 35,000 45,000 300 150
3 -400 NULL 45,00 -100 0
4 300 200 60,000 105,000 200 525
5 -100 NULL 105,000 100 0
I cannot figure out a way to bring the cumulative average from the previous row if the current column num1 of the row is a negative number. Instead of the above, the expected output should be:
avg_id num1 num2 current_total cumulative_sum cumulative_num1 cumulative_average
1 100 100 10,000 10,000 100 100
2 200 175 35,000 45,000 300 150
3 -400 150 -60,000 -15,00 -100 150
4 300 200 60,000 45,000 200 225
5 -100 225 -22,500 22,500 100 225
How can I get the column value of the last row in this case?
Edit:
I have edited the SQL Script above. I really like Gordon Linoff's approach . But it sadly produces the wrong result as per the Script change:
avg_id num1 num2 new_num2
1 100 100 100
2 200 175 175
3 -400 150 150 (Correct)
4 300 200 200
5 -100 225 50 (Incorrect)
Edit 2
I also checked Multisync's answer , it also produces the wrong result:
avg_id num1 num2 current_total cumulative_sum cumulative_num1 cumulative_average
1 100 100 10,000 10,000 100 100
2 200 175 35,000 45,000 300 150
3 -400 150 (Correct) -60,000 -15,00 -100 150
4 300 200 60,000 45,000 200 225
5 -100 175 (Incorrect) -17,500 27,500 100 275
Edit 3
I accepted Multisync's updated answer as it gives the correct results. I would also like to know how I can improve queries like this when we have many aggregate and window functions. Any link on this topic would be helpful.
source to share
I can only think of a recursive query:
with recursive tmp (avg_id, num1, num2, sum_m, sum_num1, last_id) as (
select avg_id, num1, num2, num1 * num2, num1, avg_id
from running_averages where avg_id = 1
union all
select r.avg_id, r.num1,
case when r.num1 < 0 then t.sum_m / t.sum_num1 else r.num2 end,
t.sum_m + case when r.num1 < 0 then t.sum_m / t.sum_num1 else r.num2 end * r.num1,
t.sum_num1 + r.num1,
r.avg_id
from running_averages r join tmp t on r.avg_id = t.last_id + 1
)
select avg_id, num1, num2,
num1 * num2 AS current_total,
SUM(num1 * num2) OVER(order by avg_id) AS cumulative_sum,
SUM(num1) OVER(order by avg_id) AS culmulative_num1,
SUM(num1 * num2) OVER(order by avg_id)
/ SUM(num1) OVER(order by avg_id) AS cumulative_average
from tmp;
avg_id
should contain consequtive numbers (you can use instead row_number()
, I didn't use it for simplicity)
num2
changes during computation, so I can't think of anything else recursive query (the output of the previous step is the input of the next step)
source to share
Focus on this:
In the above table, the "num2" column must be updated with the cumulative average from the previous row if the "num1" column is negative
It shouldn't be too hard:
select ra.*,
(case when num1 >= 0 then num2
else avg(num1) over (order by avg_id rows between unbounded preceding and 1 preceding)
end) as new_num2
from running_averages ra;
I think you can do the rest of the calculations using new_num2
.
source to share