Deterministic sort order for window functions
I have a table status
and I want to get the latest information.
Slno | ID | Status | date
1 | 1 | Pass | 15-06-2015 11:11:00 - this is inserted first
2 | 1 | Fail | 15-06-2015 11:11:00 - this is inserted second
3 | 2 | Fail | 15-06-2015 12:11:11 - this is inserted first
4 | 2 | Pass | 15-06-2015 12:11:11 - this is inserted second
I am using a window function with partition by ID order by date desc
to get the first value.
Excluded output:
2 | 1 | Fail | 15-06-2015 11:11:00 - this is inserted second
4 | 2 | Pass | 15-06-2015 12:11:11 - this is inserted second
Actual output:
1 | 1 | Pass | 15-06-2015 11:11:00 - this is inserted first
3 | 2 | Fail | 15-06-2015 12:11:11 - this is inserted first
According to [ http://docs.aws.amazon.com/redshift/latest/dg/r_Examples_order_by_WF.html] adding a second column ORDER BY
to the window function can fix the problem. But I have no other column to distinguish the lines!
Is there any other approach to solving the problem?
EDIT: I added slno
here for clarity. I do not have slno
as such in the table!
My SQL:
with range as (
select id from status where date between 01-06-2015 and 30-06-2015
), latest as (
select status, id, row_number() OVER (PARTITION BY id ORDER BY date DESC) row_num
)
select * from latest where row_num = 1
source to share
If not in your table slno
, then you have no reliable information about which row was inserted first. There is no natural order in the table, the physical order of the rows can change at any time (with any update or with help VACUUM
, etc.)
You can use an unreliable trick: Inner order ctid
.
select *
from (
select id, status
, row_number() OVER (PARTITION BY id
ORDER BY date, ctid) AS row_num
from status -- that your table name??
where date >= '2015-06-01' -- assuming column is actually a date
and date < '2015-07-01'
) sub
where row_num = 1;
-
In the absence of any other information, which line came first (this is a design error , fix it!), You can try to keep what you can use with the internal tuple ID
ctid
Rows will be in physical order if inserted initially, but this could change at any time with any table write
VACUUM
or other event.
This is an extreme measure and it will break. -
Your submitted query was invalid for multiple counters: no column name in 1st CTE, no table name in 2nd CTE, ...
-
You don't need a CTE for this.
Simplification with DISTINCT ON
(considerations for ctid
apply the same):
SELECT DISTINCT ON (id)
id, status
FROM status
WHERE date >= '2015-06-01'
AND date < '2015-07-01'
ORDER BY id, date, ctid;
source to share