Redshift: getting the rank of a row filtered by condition

Question

Redshift: getting the rank of a row filtered by condition

Every time I add a row to the table, I want to know where it is compared to the table up to that point. This is easily done with the window function RANK (). However, I am struggling to find a way to find out where it is in comparison to the table until that point is filtered by the value .

As an example, I want to end this very contrived table:

    date    |   name  | animal_bought | num_sloths_bought_before | num_camels_bought_before
------------+---------+---------------+--------------------------+--------------------------
 2014-09-01 | Vincent | sloth         | 0                        | 0
 2014-09-01 | Luis    | camel         | 0                        | 0
 2014-09-02 | Vincent | sloth         | 1                        | 0
 2014-09-02 | Luis    | camel         | 0                        | 1
 2014-09-02 | Kevin   | sloth         | 0                        | 0
 2014-09-03 | Vincent | camel         | 1                        | 0
 2014-09-04 | Deo     | camel         | 0                        | 0
 2014-09-04 | Vincent | sloth         | 2                        | 1
 2014-09-05 | Luis    | camel         | 0                        | 2
 2014-09-05 | Andrew  | sloth         | 0                        | 0

I first looked to see if I could apply a filter to a window function (for example RANK() OVER(PARTITION BY name WHERE animal_bought = 'sloth' ORDER BY date ASC) AS num_sloths_bought_before

), but that is not syntactically correct. Then I tried to add an additional request:

SELECT
  date,
  name,
  animal_bought,
  ( SELECT
      RANK() OVER(PARTITION BY name ORDER BY date ASC) - 1
    FROM this_table
    WHERE animal_bought = 'sloth'
  ) AS num_sloths_bought_before
FROM source_table

but Redshift threw this error:

ERROR:  This type of correlated subquery pattern is not supported yet

I've also tried putting the window function in a case statement (throws the same error) and calculates the rows in the join request (couldn't get it to work).

+3

sql amazon-redshift

Serenthia 09 Sep 14 at 13:34

source to share

1 answer

Gordon linoff · Accepted Answer · 2014-09-09T13:44:07+0000

Hmmm. I don't think this query will do everything you want:

SELECT date, name, animal_bought,
       (SELECT RANK() OVER(PARTITION BY name ORDER BY date ASC) - 1
        FROM this_table
        WHERE animal_bought = 'sloth'
       ) AS num_sloths_bought_before
FROM source_table

For several reasons:

Usage rank()

assumes this_table

there is more than one line that matches animal_bought

. Otherwise, you can use the aggregation function.
If there is only one line that matches a sentence where

, then the value is always 1, because the sentence is where

processed before rank()

.
Your question only mentions one table, but your query consists of two

Perhaps you just want rank()

no subquery?

SELECT date, name, animal_bought,
       RANK() OVER (PARTITION BY name, animal ORDER BY date ASC) - 1 as NumberBoughtBefore
FROM source_table;

If you want it for both animals then don't use it rank()

, use the cumulative sum:

SELECT date, name, animal_bought,
       sum(case when animal = 'sloth' then 1 else 0 end) over (partition by name order by date) as SlothsBefore,
       sum(case when animal = 'camel' then 1 else 0 end) over (partition by name order by date) as CamelsBefore
FROM source_table;

EDIT:

SELECT date, name, animal_bought,
       (sum(case when animal = 'sloth' then 1 else 0 end) over (partition by name order by date) -
        (case when animal = 'sloth' then 1 else 0 end)
       ) as SlothsBefore,
       (sum(case when animal = 'camel' then 1 else 0 end) over (partition by name order by date) -
        (case when animal = 'camel' then 1 else 0 end)
       ) as CamelsBefore
FROM source_table;

Redshift: getting the rank of a row filtered by condition

More articles: