PostgreSQL aggregation function and missing frame lines

I am trying to define a PostgreSQL aggregate function that knows the rows given in the frame clause but which are missing. In particular, consider an aggregate function framer

whose job it is to return an array of values ​​aggregated through it, with any missing values ​​in the frame, returned as null

. Thus,

select
    n,
    v,
    framer(v) over (order by v rows between 2 preceding and 2 following) arr
from (values (1, 3200), (2, 2400), (3, 1600), (4, 2900), (5, 8200)) as v (n, v)
order by v

      

should return

"n" "v" "arr"
3   1600    {null,null,1600,2400,2900}
2   2400    {null,1600,2400,2900,3200}
4   2900    {1600,2400,2900,3200,8200}
1   3200    {2400,2900,3200,8200,null}
5   8200    {2900,3200,8200,null,null}

      

Basically, I want to grab a range of values ​​around each value, and it's important for me to know if I'm missing the left or the right (or maybe both). Seems simple enough. I was expecting something like this:

create aggregate framer(anyelement) (
    sfunc = array_append,
    stype = anyarray,
    initcond = '{}'
);

      

but it returns

"n" "v" "arr"
3   1600    {1600,2400,2900}
2   2400    {1600,2400,2900,3200}
4   2900    {1600,2400,2900,3200,8200}
1   3200    {2400,2900,3200,8200}
5   8200    {2900,3200,8200}

      

So, sfunc

really only called three times when two values ​​are missing.

I couldn't think of any ridiculous way to capture these missing lines. It looks like there should be a simple solution, such as somehow adding / adding some null sentinel data to the data before starting the aggregate, or perhaps somehow passing to the index (and frame values) as well as the actual value of the function ...

I wanted to implement this as a collection because it gave the best user experience for what I want to do. Is there a better way?

FWIW, I'm on postgres 9.6.

+3


source to share


1 answer


Ok, that was interesting. :)

I created an aggregate framer(anyarray, anyelement, int)

so that we can size the array according to the size of the window.

First, let's replace array_append

with our own framer_msfunc

:

CREATE OR REPLACE FUNCTION public.framer_msfunc(arr anyarray, val anyelement, size_ integer)
 RETURNS anyarray
 LANGUAGE plpgsql
AS $function$
DECLARE
    result ALIAS FOR $0;
    null_ val%TYPE := NULL; -- NULL of the same type as `val`
BEGIN

    IF COALESCE(array_length(arr, 1), 0) = 0 THEN
        -- create an array of nulls with the size of `size_`
        result := array_fill(null_, ARRAY[size_]);
    ELSE
        result := arr;
    END IF;

    IF result[size_] IS NULL THEN
        -- first run or after `minvfunc`.
        -- a NULL inserted at the end in `minvfunc` so we want to replace that.
        result[size_] := val;
    ELSE
        -- `minvfunc` not yet called so we just append and drop the first.
        result := (array_append(result, val))[2:];
    END IF;

    RETURN result;

END;
$function$

      

Then we create minvfunc

as needed to move the aggregates.

CREATE OR REPLACE FUNCTION public.framer_minvfunc(arr anyarray, val anyelement, size_ integer)
 RETURNS anyarray
 LANGUAGE plpgsql
AS $function$
BEGIN

    -- drop the first in the array and append a null
    RETURN array_append(arr[2:], NULL);

END;
$function$

      

Then we define a collection with the arguments of the moving aggregates:



create aggregate framer(anyelement, int) (
    sfunc = framer_msfunc,
    stype = anyarray,
    msfunc = framer_msfunc,
    mstype = anyarray,
    minvfunc = framer_minvfunc,
    minitcond = '{}'
);

      

We set framer_msfunc

as sfunc

required, sfunc

but it doesn't actually work. It could be replaced by fuction with the same arguments, but is actually just calling array_append

inside, so it will actually do something useful.

And here's your example, but with a few more inputs.

The frame size must be at least the size of the window. This does not work with smaller sizes.

select
    n,
    v,
    framer(v, 5) over (order by v rows between 2 preceding and 2 following) arr
from (values (1, 3200), (2, 2400), (3, 1600), (4, 2900), (5, 8200), (6, 2333), (7, 1500)) as v (n, v)
order by v
;
 n |  v   |            arr
---+------+----------------------------
 7 | 1500 | {NULL,NULL,1500,1600,2333}
 3 | 1600 | {NULL,1500,1600,2333,2400}
 6 | 2333 | {1500,1600,2333,2400,2900}
 2 | 2400 | {1600,2333,2400,2900,3200}
 4 | 2900 | {2333,2400,2900,3200,8200}
 1 | 3200 | {2400,2900,3200,8200,NULL}
 5 | 8200 | {2900,3200,8200,NULL,NULL}
(7 rows)

      

It would be nice if the size could be inferred from the size of the window, but I can't seem to find if this can be done.

+2


source







All Articles