Finding Header and Tail Events in SQL Server (Optimization)

Question

Finding Header and Tail Events in SQL Server (Optimization)

I have an event table, I need to find all type 1 tail events and all type 1 head events.

So, for a set of events in this order [1, 1], 3, 1, 4, 5, [1,1,1], the parentheses denote head and tail events of type 1.

This is better illustrated in SQL:

drop table #event
go 
create table #event (group_id int, [date] datetime, [type] int)
create index idx1 on #event (group_id, date)


insert into #event values (1, '2000-01-01', 1) 
insert into #event values (1, '2000-01-02', 1) 
insert into #event values (1, '2000-01-03', 3) 
insert into #event values (1, '2000-01-04', 2) 
insert into #event values (1, '2000-01-05', 1) 
insert into #event values (2, '2000-01-01', 2) 
insert into #event values (2, '2000-01-02', 2) 
insert into #event values (2, '2000-01-03', 3) 
insert into #event values (2, '2000-01-04', 2) 
insert into #event values (2, '2000-01-05', 1) 
insert into #event values (3, '2000-01-01', 1) 
insert into #event values (3, '2000-01-02', 2) 
insert into #event values (3, '2000-01-03', 1) 
insert into #event values (3, '2000-01-04', 2) 
insert into #event values (3, '2000-01-05', 2) 
insert into #event values (4, '2000-01-01', 2) 
insert into #event values (4, '2000-01-02', 2) 
insert into #event values (4, '2000-01-03', 3) 
insert into #event values (4, '2000-01-04', 1) 
insert into #event values (4, '2000-01-05', 1) 

go 

select e1.* from #event e1 
where (
    not exists (
        select top 1 1 
        from #event e2 
        where e1.group_id = e2.group_id 
        and e2.date < e1.date 
        and e2.type <> 1
    ) or not exists (
        select top 1 1 
        from #event e2 
        where e1.group_id = e2.group_id 
        and e2.date > e1.date 
        and e2.type <> 1
    ) 
)
and e1.type = 1

Expected results:

1   2000-01-01 00:00:00.000 1
1   2000-01-02 00:00:00.000 1
1   2000-01-05 00:00:00.000 1
2   2000-01-05 00:00:00.000 1
3   2000-01-01 00:00:00.000 1
4   2000-01-04 00:00:00.000 1
4   2000-01-05 00:00:00.000 1

This all works fine and returns my expected results, but it is scanning the table 3 times. Is there a way to do this faster and reduce the number of table scans?

0

optimization sql-server

Sam saffron 13 nov. '08 at 5:36

source to share

9 replies

I think this is better:

select e1.group_id, e1.date, e1.type
from #event e1, #event e2
where e1.type = 1
and e2.type <> 1
and e1.group_id = e2.group_id
group by e1.group_id, e1.date , e1.type, e2.group_id
having e1.date <min (e2.date) or e1.date> max (e2.date)

+1

adamant7 Dec 12. '08 at 17:07

source to share

From what I understand, you are after that - head and tail ordered by day ** for each id **. The head and tail are all records until a record is encountered where the type is not one.

This is another way to do it, it could be faster

;WITH Ranked AS (
    SELECT 
        *,
        Row_Number() OVER (ORDER BY date) as 'rnk'
    FROM #event
)


SELECT * 
FROM Ranked
WHERE rnk not between 
        (SELECT Min(rnk) FROM Ranked r WHERE r.type <> 1 AND ranked.id = r.id)
        AND (SELECT Max(rnk) FROM Ranked r WHERE r.type <> 1 AND ranked.id = r.id)
order by id

You don't need to use TOP with the exist operator, although it doesn't hurt.

0

Robert Wagner 13 nov. '08 at 5:46

source to share

Basically it looks like you want to get a type 1 event that is less than the timestamp than the other events and more than the timestamp of a different date Try this, written in Oracle Syntax, not sure about MSSQL.

select e1. * from e1, where e1.id = 1 and (e1.date <= (select min (e2.date) from e2, where e2.id <> 1 group by e2.date) or (e1.date> = select max (e3.date) from e3, where e3.id <> 1 group by e3.date))

0

Dheer 13 nov. '08 at 6:31

source to share

Consider

create idx2 index on #event (type)

I don't have a SQL Server to check, but in Oracle it will eliminate the top level scan (for the "type = 1" condition).

As for the query itself - in MS SQL 2000 "[no] exists" and "no" in "predicates" almost always a full scan was performed - we replaced them with the corresponding JOINs.

0

Dmitry Khalatov 13 nov. '08 at 6:32

source to share

With only 20 records, the query optimizer can easily conclude that 3 table scans are the fastest way to do this, no matter how you express the query or what indexes you create. With small data records, the entire table is likely to boot with one or more disk reads; and there is not much optimization in the read unit; The optimizer is primarily interested in minimizing disk activity, and everything else is relatively unimportant. What the optimizer does with so many records is not a good indicator of how it will handle large volumes.

Do you have a real table with a lot of data? I cannot imagine that you can at all perceive any exectuion time.

0

dkretz 13 nov. '08 at 7:16

source to share

Here is the unions version:

select distinct e1.* from #event e1 
left outer join #event e2 ON 
        e1.id = e2.id 
        and e2.date < e1.date 
        and e2.type <> 1
left outer join #event e3 ON
        e1.id = e3.id 
        and e3.date > e1.date 
        and e3.type <> 1
where e1.type = 1 AND (e2.id is null or e3.id is null)

This still has three table scans plus a separate clause, but it still seems to be faster than the original query.

0

csgero 13 nov. '08 at 12:48

source to share

Here's a query whose showplan at least doesn't say "table scan" in it and gives the same answer.

I still don't understand the query in logical terms.

SELECT DISTINCT e1. * FROM #event e1
WHERE e1.type = 1
  AND
  (
      DOES NOT EXIST (
          SELECT 1 FROM #event
          WHERE type! = 1
              AND id = e1.id
              AND date <e1.date
      )
      OR DOES NOT EXIST (
          SELECT 1 FROM #event
          WHERE type ! = 1
              AND id = e1.id
              AND date> e1.date
      )
  )

0

dkretz 13 nov. '08 at 19:20

source to share

This is better than the latter:

SELECT e1. * FROM event e1
WHERE e1.type = 1
  AND DOES NOT EXIST (
      SELECT 1 FROM event
      WHERE type! = 1
          AND id = e1.id
          AND date <e1.date
  )
UNION ALL SELECT e1. * FROM event e1
WHERE e1.type = 1
  AND DOES NOT EXIST (
      SELECT 1 FROM event
      WHERE type! = 1
          AND id = e1.id
          AND date> e1.date
  )

The result is the same. But your request, and my previous one, are terrible. Here's the IO stats:

(29985 lines (lines) affected)

Event table. Number of scans 9997, logical read 20477, physical read 0, read-read read 0.
Table "Desktop". Scan count 39979, logical read 99950, physical read 0, read forward - 0.

Here's the same for the most recent request:

(29985 lines (lines) affected)

Event table. Scan count 4, logical read 652, physical read 0, read forward - 0.

Two things to notice -
1. Even in the worst case, the entire table is loaded and there are no physical reads.
2. The bad ones have 9997 + 39979 table scans. Not only 4.

Describe the purpose of the request.

0

dkretz 13 nov. '08 at 19:51

source to share

Sam saffron · Accepted Answer · 2008-11-13T07:32:30+0000

To create a large subset of your data, you can use this:

declare @i int 
set @i = 10000
while @i > 5 
begin
    insert into #event values (@i, '2000-01-01', 1) 
    insert into #event values (@i, '2000-01-02', 1) 
    insert into #event values (@i, '2000-01-03', 3) 
    insert into #event values (@i, '2000-01-04', 2) 
    insert into #event values (@i, '2000-01-05', 1)  
    set @i = @i -1 
end

Also, to include many events in a group, try this:

declare @j int 
set @j = 0 
while @j < 10
begin 
    set nocount on 
    declare @i int 
    set @i = 0
    while @i < 10000 
    begin
        insert into #event values (@j, DateAdd(d, @i, '2000-01-01'), rand(10) * 10) 

        set @i = @i  +1 
    end
    set @j = @j + 1 
end
set nocount off

In all of my testing, it seems that my original query only produces 3 table scans, and I'm not sure if there is any performance improvement here.

Finding Header and Tail Events in SQL Server (Optimization)

More articles: