Redshift: getting the rank of a row filtered by condition
Every time I add a row to the table, I want to know where it is compared to the table up to that point. This is easily done with the window function RANK (). However, I am struggling to find a way to find out where it is in comparison to the table until that point is filtered by the value .
As an example, I want to end this very contrived table:
date | name | animal_bought | num_sloths_bought_before | num_camels_bought_before
------------+---------+---------------+--------------------------+--------------------------
2014-09-01 | Vincent | sloth | 0 | 0
2014-09-01 | Luis | camel | 0 | 0
2014-09-02 | Vincent | sloth | 1 | 0
2014-09-02 | Luis | camel | 0 | 1
2014-09-02 | Kevin | sloth | 0 | 0
2014-09-03 | Vincent | camel | 1 | 0
2014-09-04 | Deo | camel | 0 | 0
2014-09-04 | Vincent | sloth | 2 | 1
2014-09-05 | Luis | camel | 0 | 2
2014-09-05 | Andrew | sloth | 0 | 0
I first looked to see if I could apply a filter to a window function (for example RANK() OVER(PARTITION BY name WHERE animal_bought = 'sloth' ORDER BY date ASC) AS num_sloths_bought_before
), but that is not syntactically correct. Then I tried to add an additional request:
SELECT
date,
name,
animal_bought,
( SELECT
RANK() OVER(PARTITION BY name ORDER BY date ASC) - 1
FROM this_table
WHERE animal_bought = 'sloth'
) AS num_sloths_bought_before
FROM source_table
but Redshift threw this error:
ERROR: This type of correlated subquery pattern is not supported yet
I've also tried putting the window function in a case statement (throws the same error) and calculates the rows in the join request (couldn't get it to work).
source to share
Hmmm. I don't think this query will do everything you want:
SELECT date, name, animal_bought,
(SELECT RANK() OVER(PARTITION BY name ORDER BY date ASC) - 1
FROM this_table
WHERE animal_bought = 'sloth'
) AS num_sloths_bought_before
FROM source_table
For several reasons:
- Usage
rank()
assumesthis_table
there is more than one line that matchesanimal_bought
. Otherwise, you can use the aggregation function. - If there is only one line that matches a sentence
where
, then the value is always 1, because the sentence iswhere
processed beforerank()
. - Your question only mentions one table, but your query consists of two
Perhaps you just want rank()
no subquery?
SELECT date, name, animal_bought,
RANK() OVER (PARTITION BY name, animal ORDER BY date ASC) - 1 as NumberBoughtBefore
FROM source_table;
If you want it for both animals then don't use it rank()
, use the cumulative sum:
SELECT date, name, animal_bought,
sum(case when animal = 'sloth' then 1 else 0 end) over (partition by name order by date) as SlothsBefore,
sum(case when animal = 'camel' then 1 else 0 end) over (partition by name order by date) as CamelsBefore
FROM source_table;
EDIT:
SELECT date, name, animal_bought,
(sum(case when animal = 'sloth' then 1 else 0 end) over (partition by name order by date) -
(case when animal = 'sloth' then 1 else 0 end)
) as SlothsBefore,
(sum(case when animal = 'camel' then 1 else 0 end) over (partition by name order by date) -
(case when animal = 'camel' then 1 else 0 end)
) as CamelsBefore
FROM source_table;
source to share