How to group / rank records based on column value change?
I have the following table, ordered by Id, Year DESC
Id Year Valid 1 2011 1 1 2010 1 1 2009 0 1 2002 1 4 2013 1 4 2012 1 4 2011 1 etc.
I would like to have an additional rank field, for example:
Id Year Valid Rank 1 2011 1 1 1 2010 1 1 1 2009 0 2 1 2002 1 3 4 2013 1 1 4 2012 1 1 4 2011 1 1 etc.
Basically on the variable ranking of the same name for each change in the Valid field. So I can query in the rank = 1 field that all Valid = 1 for each Id up to the first Valid = 0. Or is there an easier way to select the first top number of records that match a certain condition (for Id = 1, only the first two records ). I've already played around with ROW_NUMBER (), RANK () and PARTITION BY, but I can't seem to get it to work. Query nesting should be avoided as the actual query is against a large database.
Any ideas?
Thank you and greetings, Kotelnikova
source to share
yes, using Left JOIN we can do it. see below code and result.
The first image is the actual data inserted and the second image is expected.
DECLARE @t TABLE
(
id INT
,_YEAR INT
,valid TINYINT
)
INSERT INTO @t( id, [_YEAR], valid )
SELECT 1,2011,1
UNION ALL SELECT 1,2010,1
UNION ALL SELECT 1,2009,0
UNION ALL SELECT 1,2002,1
UNION ALL SELECT 4,2013,1
UNION ALL SELECT 4,2012,1
UNION ALL SELECT 4,2011,1
UNION ALL SELECT 5,2013,0
UNION ALL SELECT 5,2011,1
UNION ALL SELECT 5,2010,1
UNION ALL SELECT 6,2010,1
UNION ALL SELECT 6,2011,0
UNION ALL SELECT 6,2014,1
SELECT q1.*
FROM @t q1
LEFT JOIN
(
SELECT id,MAX(_YEAR) ZeroYear
FROM @t
WHERE valid = 0
GROUP BY id
)q2
ON q1.id=q2.id
WHERE
(q2.ID IS NULL)
OR
(q2.id IS NOT NULL AND q1.id IS NOT NULL AND q1.id=q2.id AND q1.[_YEAR] > q2.ZeroYear)
Edit-1: In the above query for the ZeroYear column, I did MIN (_YEAR) earlier, but as you can see in the comment from "Andriy M" instead of MIN, the right MAX function.
source to share
This is somewhat similar to @ Anup Shah's suggestion , but does not use a connection and instead uses the window aggregation function:
WITH derived AS (
SELECT
Id,
Year,
Valid,
LatestInvalidYear = ISNULL(
MAX(CASE Valid WHEN 0 THEN Year END) OVER (PARTITION BY Id),
0
)
FROM atable
)
SELECT
Id,
Year,
Valid
FROM derived
WHERE Year > LatestInvalidYear
;
Basically, the MAX window calculates the last Valid = 0
year for Id
. If no such year is found, MAX results in NULL, which is replaced by 0 by ISNULL. So, for your example, the set derived
will be returned like this:
Id Year Valid LatestInvalidYear
-- ---- ----- -----------------
1 2011 1 2009
1 2010 1 2009
1 2009 0 2009
1 2002 1 2009
4 2013 1 0
4 2012 1 0
4 2011 1 0
Obviously, you can now easily apply a filter Year > LatestInvalidYear
to get the rows you want , and that's what the main SELECT does.
source to share