Redshift: getting the rank of a row filtered by condition

Every time I add a row to the table, I want to know where it is compared to the table up to that point. This is easily done with the window function RANK (). However, I am struggling to find a way to find out where it is in comparison to the table until that point is filtered by the value .

As an example, I want to end this very contrived table:

    date    |   name  | animal_bought | num_sloths_bought_before | num_camels_bought_before
------------+---------+---------------+--------------------------+--------------------------
 2014-09-01 | Vincent | sloth         | 0                        | 0
 2014-09-01 | Luis    | camel         | 0                        | 0
 2014-09-02 | Vincent | sloth         | 1                        | 0
 2014-09-02 | Luis    | camel         | 0                        | 1
 2014-09-02 | Kevin   | sloth         | 0                        | 0
 2014-09-03 | Vincent | camel         | 1                        | 0
 2014-09-04 | Deo     | camel         | 0                        | 0
 2014-09-04 | Vincent | sloth         | 2                        | 1
 2014-09-05 | Luis    | camel         | 0                        | 2
 2014-09-05 | Andrew  | sloth         | 0                        | 0

      

I first looked to see if I could apply a filter to a window function (for example RANK() OVER(PARTITION BY name WHERE animal_bought = 'sloth' ORDER BY date ASC) AS num_sloths_bought_before

), but that is not syntactically correct. Then I tried to add an additional request:

SELECT
  date,
  name,
  animal_bought,
  ( SELECT
      RANK() OVER(PARTITION BY name ORDER BY date ASC) - 1
    FROM this_table
    WHERE animal_bought = 'sloth'
  ) AS num_sloths_bought_before
FROM source_table

      

but Redshift threw this error:

ERROR:  This type of correlated subquery pattern is not supported yet

      

I've also tried putting the window function in a case statement (throws the same error) and calculates the rows in the join request (couldn't get it to work).

+3


source to share


1 answer


Hmmm. I don't think this query will do everything you want:

SELECT date, name, animal_bought,
       (SELECT RANK() OVER(PARTITION BY name ORDER BY date ASC) - 1
        FROM this_table
        WHERE animal_bought = 'sloth'
       ) AS num_sloths_bought_before
FROM source_table

      

For several reasons:

  • Usage rank()

    assumes this_table

    there is more than one line that matches animal_bought

    . Otherwise, you can use the aggregation function.
  • If there is only one line that matches a sentence where

    , then the value is always 1, because the sentence is where

    processed before rank()

    .
  • Your question only mentions one table, but your query consists of two

Perhaps you just want rank()

no subquery?



SELECT date, name, animal_bought,
       RANK() OVER (PARTITION BY name, animal ORDER BY date ASC) - 1 as NumberBoughtBefore
FROM source_table;

      

If you want it for both animals then don't use it rank()

, use the cumulative sum:

SELECT date, name, animal_bought,
       sum(case when animal = 'sloth' then 1 else 0 end) over (partition by name order by date) as SlothsBefore,
       sum(case when animal = 'camel' then 1 else 0 end) over (partition by name order by date) as CamelsBefore
FROM source_table;

      

EDIT:

SELECT date, name, animal_bought,
       (sum(case when animal = 'sloth' then 1 else 0 end) over (partition by name order by date) -
        (case when animal = 'sloth' then 1 else 0 end)
       ) as SlothsBefore,
       (sum(case when animal = 'camel' then 1 else 0 end) over (partition by name order by date) -
        (case when animal = 'camel' then 1 else 0 end)
       ) as CamelsBefore
FROM source_table;

      

+2


source







All Articles