Joining conditions and aggregate functions

I have a table that has records of work related to front doors.

DECLARE @doorStatistics TABLE
( id INT IDENTITY,
[user] VARCHAR(250),
accessDate DATETIME,
accessType VARCHAR(5)
)

      

Examples of entries:

INSERT INTO @doorStatistics([user],accessDate,accessType) VALUES ('John Wayne','2009-09-01 07:02:43.000','IN')
INSERT INTO @doorStatistics([user],accessDate,accessType) VALUES ('Bruce Willis','2009-09-01 07:12:43.000','IN')
INSERT INTO @doorStatistics([user],accessDate,accessType) VALUES ('Bruce Willis','2009-09-01 07:22:43.000','OUT')
INSERT INTO @doorStatistics([user],accessDate,accessType) VALUES ('John Wayne','2009-09-01 07:32:43.000','OUT')
INSERT INTO @doorStatistics([user],accessDate,accessType) VALUES ('John Wayne','2009-09-01 07:37:43.000','IN')
INSERT INTO @doorStatistics([user],accessDate,accessType) VALUES ('Bruce Willis','2009-09-01 07:42:43.000','IN')
INSERT INTO @doorStatistics([user],accessDate,accessType) VALUES ('John Wayne','2009-09-01 07:48:43.000','OUT')
INSERT INTO @doorStatistics([user],accessDate,accessType) VALUES ('Bruce Willis','2009-09-01 07:52:43.000','OUT')

      

I want to make a query that gives me the following output (based on the above example):

| user         | date       | inHour   | outHour  |
|--------------|------------|----------|----------|
| John Wayne   | 2009-09-01 | 07:02:43 | 07:48:43 |
| Bruce Willis | 2009-09-01 | 07:12:43 | 07:22:43 |
| John Wayne   | 2009-09-02 | 07:37:43 | 07:48:43 |
| Bruce Willis | 2009-09-02 | 07:42:43 | 07:52:43 |

      

The request I made is the following:

SELECT [user], accessDate AS [in date], 
    (SELECT MIN(accessDate) 
        FROM @doorStatistics ds2 
        WHERE accessType = 'OUT' 
            AND ds2.accessDate > ds.accessDate 
            AND ds.[user] = ds2.[user]) AS [out date] 
FROM @doorStatistics ds 
WHERE accessType = 'IN'

      

But this is not good, because when the user forgets to register his login, he will produce, for example, something like this:

| user         | date       | inHour   | outHour  |
|--------------|------------|----------|----------|
| John Wayne   | 2009-09-02 | 07:02:43 | 07:48:43 |
| John Wayne   | 2009-09-02 | 07:02:43 | 09:26:43 |

      

As long as it should be

| user         | date       | inHour   | outHour  |
|--------------|------------|----------|----------|
| John Wayne   | 2009-09-02 | 07:02:43 | 07:48:43 |
| John Wayne   | 2009-09-02 | NULL     | 09:26:43 |

      

The second reason why a query does not fit is performance. I have over 200,000 records and SELECT for each row slows down the query.

Possible solution would be to join two tables

SELECT * FROM @doorStatistics WHERE accessType = 'IN'

      

from

SELECT * FROM @doorStatistics WHERE accessType = 'OUT'

      

but I don't know what conditions to put in order to get the correct date. Maybe some MAX or MIN functions can be added, but I have no idea.

I don't want to create a temporary table and use cursors.

+2


source to share


3 answers


When designing databases for temporary events that have a duration, it is best to practice "IN" and "OUT" times on the same line.

All the queries you need to make are much easier.



See " Joe Celko's SQL Programming Style , " where he talks about temporal cohesion on pages 48 and 154.

+1


source


To improve performance at the framework level:

  • I suggest renaming your column accessDate

    toaccessDateTime

  • then you create a calculated column PERSISTENT based on accessDateTime

    (shown below). Then the index you want will only include the column accessDate

    that you will use for exact comparison along withuser

  • make sure you have correct indexes on the table (from the code below you will probably need one on "user", "accessDate" and including "accessType"

accessDate

column definition:

accessDate AS CONVERT(SMALLDATETIME, CONVERT(CHAR(8), accessDateTime, 112), 112) PERSISTED

      



Now, given that you've done this, and you have SQL-2005 + , this awful long query should do the job :

WITH MatchIN (in_id, out_id)
AS (SELECT      s.id, CASE WHEN COALESCE(y.id, s.id) = s.id THEN x.id ELSE NULL END
    FROM        @doorStatistics s
    LEFT JOIN   @doorStatistics x
            ON  x.id = (SELECT  TOP 1 z.id
                        FROM    @doorStatistics z
                        WHERE   z."user" = s."user"
                            AND z.accessType = 'OUT'
                            AND z.accessDate =  s.accessDate
                            AND z.accessDateTime >= s.accessDateTime
                        ORDER BY z.accessDateTime ASC
                        )
    LEFT JOIN   @doorStatistics y
            ON  y.id = (SELECT  TOP 1 z.id
                        FROM    @doorStatistics z
                        WHERE   z."user" = s."user"
                            AND z.accessType = 'IN'
                            AND z.accessDate =  s.accessDate
                            AND z.accessDateTime >= s.accessDateTime
                            AND z.accessDateTime <= x.accessDateTime
                        ORDER BY z.accessDateTime DESC
                        )
    WHERE       s.accessType = 'IN'
)
,    MatchOUT (out_id, in_id)
AS (SELECT      s.id, CASE WHEN COALESCE(y.id, s.id) = s.id THEN x.id ELSE NULL END
    FROM        @doorStatistics s
    LEFT JOIN   @doorStatistics x
            ON  x.id = (SELECT  TOP 1 z.id
                        FROM    @doorStatistics z
                        WHERE   z."user" = s."user"
                            AND z.accessType = 'IN'
                            AND z.accessDate =  s.accessDate
                            AND z.accessDateTime <= s.accessDateTime
                        ORDER BY z.accessDateTime DESC
                        )
    LEFT JOIN   @doorStatistics y
            ON  y.id = (SELECT  TOP 1 z.id
                        FROM    @doorStatistics z
                        WHERE   z."user" = s."user"
                            AND z.accessType = 'OUT'
                            AND z.accessDate =  s.accessDate
                            AND z.accessDateTime <= s.accessDateTime
                            AND z.accessDateTime >= x.accessDateTime
                        ORDER BY z.accessDateTime ASC
                        )
    WHERE       s.accessType = 'OUT'
)

SELECT  COALESCE(i."user", o."user") AS "user",
        COALESCE(i.accessDate, o.accessDate) AS "date",
        CONVERT(CHAR(10), i.accessDateTime, 108) AS "inHour",
        CONVERT(CHAR(10), o.accessDateTime, 108) AS "outHour"
FROM   (SELECT in_id, out_id FROM MatchIN
        UNION -- this will eliminate duplicates as the same time
        SELECT in_id, out_id FROM MatchOUT
        ) x
LEFT JOIN   @doorStatistics i
        ON  i.id = x.in_id
LEFT JOIN   @doorStatistics o
        ON  o.id = x.out_id
ORDER BY    "user", "date", "inHour"

      

To test the handling of missing rows, simply comment out some of your test data INSERT statements.

+1


source


You need to select a minimum OUT record for each IN record for a given user, after ensuring that there is no intermediate IN record (which would correspond to someone who received IN twice without leaving the building). This requires some modestly complex SQL (like the NOT EXISTS clause). This way you will have a separate join on the table, plus an additional NOT EXISTS query on the same table. Just make sure you are referencing all table references anyway.

+1


source







All Articles