How to group / rank records based on column value change?

I have the following table, ordered by Id, Year DESC

Id Year Valid
1 2011 1
1 2010 1
1 2009 0
1 2002 1
4 2013 1
4 2012 1
4 2011 1
etc.

I would like to have an additional rank field, for example:

Id Year Valid Rank
1 2011 1 1
1 2010 1 1
1 2009 0 2
1 2002 1 3
4 2013 1 1
4 2012 1 1
4 2011 1 1
etc.

Basically on the variable ranking of the same name for each change in the Valid field. So I can query in the rank = 1 field that all Valid = 1 for each Id up to the first Valid = 0. Or is there an easier way to select the first top number of records that match a certain condition (for Id = 1, only the first two records ). I've already played around with ROW_NUMBER (), RANK () and PARTITION BY, but I can't seem to get it to work. Query nesting should be avoided as the actual query is against a large database.

Any ideas?

Thank you and greetings, Kotelnikova

+2


source to share


3 answers


yes, using Left JOIN we can do it. see below code and result.

The first image is the actual data inserted and the second image is expected.

enter image description here



DECLARE @t TABLE
(
    id      INT
    ,_YEAR  INT
    ,valid  TINYINT
)
INSERT INTO @t( id, [_YEAR], valid )
            SELECT 1,2011,1
UNION ALL   SELECT 1,2010,1
UNION ALL   SELECT 1,2009,0
UNION ALL   SELECT 1,2002,1
UNION ALL   SELECT 4,2013,1
UNION ALL   SELECT 4,2012,1
UNION ALL   SELECT 4,2011,1
UNION ALL   SELECT 5,2013,0
UNION ALL   SELECT 5,2011,1
UNION ALL   SELECT 5,2010,1
UNION ALL   SELECT 6,2010,1
UNION ALL   SELECT 6,2011,0
UNION ALL   SELECT 6,2014,1


SELECT  q1.*
FROM @t q1
LEFT JOIN 
(
    SELECT id,MAX(_YEAR) ZeroYear
    FROM @t
    WHERE valid = 0
    GROUP BY id
)q2
    ON q1.id=q2.id
WHERE 
(q2.ID IS NULL)
OR
(q2.id IS NOT NULL AND q1.id IS NOT NULL AND q1.id=q2.id AND q1.[_YEAR] > q2.ZeroYear)

      

Edit-1: In the above query for the ZeroYear column, I did MIN (_YEAR) earlier, but as you can see in the comment from "Andriy M" instead of MIN, the right MAX function.

+1


source


This is somewhat similar to @ Anup Shah's suggestion , but does not use a connection and instead uses the window aggregation function:

WITH derived AS (
  SELECT
    Id,
    Year,
    Valid,
    LatestInvalidYear = ISNULL(
      MAX(CASE Valid WHEN 0 THEN Year END) OVER (PARTITION BY Id),
      0
    )
  FROM atable
)
SELECT
  Id,
  Year,
  Valid
FROM derived
WHERE Year > LatestInvalidYear
;

      

Basically, the MAX window calculates the last Valid = 0

year for Id

. If no such year is found, MAX results in NULL, which is replaced by 0 by ISNULL. So, for your example, the set derived

will be returned like this:



Id   Year   Valid   LatestInvalidYear
--   ----   -----   -----------------
1    2011   1       2009
1    2010   1       2009
1    2009   0       2009
1    2002   1       2009
4    2013   1       0
4    2012   1       0
4    2011   1       0

      

Obviously, you can now easily apply a filter Year > LatestInvalidYear

to get the rows you want , and that's what the main SELECT does.

+1


source


If you are using SQL 2012 you can use lag

select id, year, valid,
    case when ch = 0 then 1 else lag(ch,1,0) over (order by id, year desc) + 2 end rank
from
    (
        select 
            * ,
            abs(valid - lag(valid,1,1) over (order by id, year desc)) as ch
        from YourTable
    ) t 

      

0


source







All Articles