Group with "holes" in the middle

Question

Group with "holes" in the middle

Given the following table:

create table #Track (id int identity, vehicle int, station varchar(50), pieces int, distance int)
insert into #Track (vehicle, station, pieces, distance)
values 
(1, 'A', 10, 0), (1, 'B', 10, 50), (1, 'C', 11, 23), (1, 'D', 11, 40), (1, 'E', 10, 5)

This is the result I want (note the change in the shape field between C and D):

vehicle station_from    station_to  pieces  distance_all
1            A             B          10      50
1            C             D          11      63
1            E             E          10       5

If I make this request:

select  A.vehicle,
        T1.station station_from,
        T2.station station_to,
        A.pieces,
        A.distance_all
from (
select  vehicle,
        min(id) min_id,
        max(id) max_id,
        pieces,
        sum(distance) distance_all
from    #Track
group
by      vehicle,
        pieces
) A join #Track T1 on A.min_id = T1.id
    join #Track T2 on A.max_id = T2.id

I got the wrong result (distance_all is correct, but the station from and to is not. It seems that vehicle 1 was going from A to E and then from C to D:

vehicle station_from    station_to  pieces  distance_all
1             A              E        10         55
1             C              D        11         63

How to achieve the desired result without using a cursor (tables are quite large, several million records)

+3

sql sql-server

Gabor June 12. 17 at 11:15

source to share

3 answers

This is a variant of the "gaps and islands" problem. In your case, you can solve this line number difference problem:

select vehicle,
       max(case when seqnum_grp = 1 then station end) as station_from,
       max(case when seqnum_grp_desc = 1 then station end) as station_to,
       pieces,
       sum(pieces) as pieces_all
from (select t.*,
             row_number() over (partition by vehicle, pieces, (seqnum - seqnum_p) order by id) as seqnum_grp,
             row_number() over (partition by vehicle, pieces, (seqnum - seqnum_p) order by id desc) as seqnum_grp_desc
      from (select t.*,
                   row_number() over (partition by vehicle order by id) as seqnum,
                   row_number() over (partition by vehicle, pieces order by id) as seqnum_p
           from #Track t
          ) t
     ) t
group by vehicle, pieces, (seqnum - seqnum_p);

To understand how this works, you need to understand why the difference in line numbers identifies groups. To do this, you just need to run the innermost subquery and look at the results.

This is a little more difficult than most of these problems because you want the first and last stations along the way. Hence, there is an additional subquery.

+2

Gordon linoff June 12. 17 at 11:22

source to share

you can request as below:

Select vehicle, min(Station) as Station_From, Max(station) Station_To, pieces, sum(distance) as Distance_all 
from (
    Select *, [Bucket] =Row_number() over(order by id) - Row_number() over(partition by pieces order by id)
    from #Track
) a
group by vehicle, pieces, [Bucket]

Output:

+---------+--------------+------------+--------+--------------+
| vehicle | Station_From | Station_To | pieces | Distance_all |
+---------+--------------+------------+--------+--------------+
|       1 | A            | B          |     10 |           50 |
|       1 | E            | E          |     10 |            5 |
|       1 | C            | D          |     11 |           63 |
+---------+--------------+------------+--------+--------------+

+1

Kannan Kandasamy June 12. 17 at 11:34 am

source to share

Vamsi Prabhala · Accepted Answer · 2017-06-12T11:30:24+0000

I think this is what you are trying to do. Processing consecutive lines with the same values for a vehicle as one group and obtaining the minimum and maximum station, the sum of the distance for this group. Use lag

to get the value of the previous row and reset the group (in the current amount) when it differs from the current row. After that, it's just a grouping operation.

select distinct vehicle
,first_value(station) over(partition by vehicle,grp order by id) as station_from
,first_value(station) over(partition by vehicle,grp order by id desc) as station_to
,pieces
,sum(distance) over(partition by vehicle,grp) as distance_all
from (select t.* ,sum(case when prev_pieces=pieces then 0 else 1 end) over(partition by vehicle order by id) as grp
      from (select t.*,lag(pieces) over(partition by vehicle order by id) as prev_pieces
            from Track t
           ) t
     ) t

Sample Demo

Group with "holes" in the middle

More articles: