Joining conditions and aggregate functions
I have a table that has records of work related to front doors.
DECLARE @doorStatistics TABLE
( id INT IDENTITY,
[user] VARCHAR(250),
accessDate DATETIME,
accessType VARCHAR(5)
)
Examples of entries:
INSERT INTO @doorStatistics([user],accessDate,accessType) VALUES ('John Wayne','2009-09-01 07:02:43.000','IN')
INSERT INTO @doorStatistics([user],accessDate,accessType) VALUES ('Bruce Willis','2009-09-01 07:12:43.000','IN')
INSERT INTO @doorStatistics([user],accessDate,accessType) VALUES ('Bruce Willis','2009-09-01 07:22:43.000','OUT')
INSERT INTO @doorStatistics([user],accessDate,accessType) VALUES ('John Wayne','2009-09-01 07:32:43.000','OUT')
INSERT INTO @doorStatistics([user],accessDate,accessType) VALUES ('John Wayne','2009-09-01 07:37:43.000','IN')
INSERT INTO @doorStatistics([user],accessDate,accessType) VALUES ('Bruce Willis','2009-09-01 07:42:43.000','IN')
INSERT INTO @doorStatistics([user],accessDate,accessType) VALUES ('John Wayne','2009-09-01 07:48:43.000','OUT')
INSERT INTO @doorStatistics([user],accessDate,accessType) VALUES ('Bruce Willis','2009-09-01 07:52:43.000','OUT')
I want to make a query that gives me the following output (based on the above example):
| user | date | inHour | outHour |
|--------------|------------|----------|----------|
| John Wayne | 2009-09-01 | 07:02:43 | 07:48:43 |
| Bruce Willis | 2009-09-01 | 07:12:43 | 07:22:43 |
| John Wayne | 2009-09-02 | 07:37:43 | 07:48:43 |
| Bruce Willis | 2009-09-02 | 07:42:43 | 07:52:43 |
The request I made is the following:
SELECT [user], accessDate AS [in date],
(SELECT MIN(accessDate)
FROM @doorStatistics ds2
WHERE accessType = 'OUT'
AND ds2.accessDate > ds.accessDate
AND ds.[user] = ds2.[user]) AS [out date]
FROM @doorStatistics ds
WHERE accessType = 'IN'
But this is not good, because when the user forgets to register his login, he will produce, for example, something like this:
| user | date | inHour | outHour |
|--------------|------------|----------|----------|
| John Wayne | 2009-09-02 | 07:02:43 | 07:48:43 |
| John Wayne | 2009-09-02 | 07:02:43 | 09:26:43 |
As long as it should be
| user | date | inHour | outHour |
|--------------|------------|----------|----------|
| John Wayne | 2009-09-02 | 07:02:43 | 07:48:43 |
| John Wayne | 2009-09-02 | NULL | 09:26:43 |
The second reason why a query does not fit is performance. I have over 200,000 records and SELECT for each row slows down the query.
Possible solution would be to join two tables
SELECT * FROM @doorStatistics WHERE accessType = 'IN'
from
SELECT * FROM @doorStatistics WHERE accessType = 'OUT'
but I don't know what conditions to put in order to get the correct date. Maybe some MAX or MIN functions can be added, but I have no idea.
I don't want to create a temporary table and use cursors.
source to share
When designing databases for temporary events that have a duration, it is best to practice "IN" and "OUT" times on the same line.
All the queries you need to make are much easier.
See " Joe Celko's SQL Programming Style , " where he talks about temporal cohesion on pages 48 and 154.
source to share
To improve performance at the framework level:
- I suggest renaming your column
accessDate
toaccessDateTime
- then you create a calculated column PERSISTENT based on
accessDateTime
(shown below). Then the index you want will only include the columnaccessDate
that you will use for exact comparison along withuser
- make sure you have correct indexes on the table (from the code below you will probably need one on "user", "accessDate" and including "accessType"
accessDate
column definition:
accessDate AS CONVERT(SMALLDATETIME, CONVERT(CHAR(8), accessDateTime, 112), 112) PERSISTED
Now, given that you've done this, and you have SQL-2005 + , this awful long query should do the job :
WITH MatchIN (in_id, out_id)
AS (SELECT s.id, CASE WHEN COALESCE(y.id, s.id) = s.id THEN x.id ELSE NULL END
FROM @doorStatistics s
LEFT JOIN @doorStatistics x
ON x.id = (SELECT TOP 1 z.id
FROM @doorStatistics z
WHERE z."user" = s."user"
AND z.accessType = 'OUT'
AND z.accessDate = s.accessDate
AND z.accessDateTime >= s.accessDateTime
ORDER BY z.accessDateTime ASC
)
LEFT JOIN @doorStatistics y
ON y.id = (SELECT TOP 1 z.id
FROM @doorStatistics z
WHERE z."user" = s."user"
AND z.accessType = 'IN'
AND z.accessDate = s.accessDate
AND z.accessDateTime >= s.accessDateTime
AND z.accessDateTime <= x.accessDateTime
ORDER BY z.accessDateTime DESC
)
WHERE s.accessType = 'IN'
)
, MatchOUT (out_id, in_id)
AS (SELECT s.id, CASE WHEN COALESCE(y.id, s.id) = s.id THEN x.id ELSE NULL END
FROM @doorStatistics s
LEFT JOIN @doorStatistics x
ON x.id = (SELECT TOP 1 z.id
FROM @doorStatistics z
WHERE z."user" = s."user"
AND z.accessType = 'IN'
AND z.accessDate = s.accessDate
AND z.accessDateTime <= s.accessDateTime
ORDER BY z.accessDateTime DESC
)
LEFT JOIN @doorStatistics y
ON y.id = (SELECT TOP 1 z.id
FROM @doorStatistics z
WHERE z."user" = s."user"
AND z.accessType = 'OUT'
AND z.accessDate = s.accessDate
AND z.accessDateTime <= s.accessDateTime
AND z.accessDateTime >= x.accessDateTime
ORDER BY z.accessDateTime ASC
)
WHERE s.accessType = 'OUT'
)
SELECT COALESCE(i."user", o."user") AS "user",
COALESCE(i.accessDate, o.accessDate) AS "date",
CONVERT(CHAR(10), i.accessDateTime, 108) AS "inHour",
CONVERT(CHAR(10), o.accessDateTime, 108) AS "outHour"
FROM (SELECT in_id, out_id FROM MatchIN
UNION -- this will eliminate duplicates as the same time
SELECT in_id, out_id FROM MatchOUT
) x
LEFT JOIN @doorStatistics i
ON i.id = x.in_id
LEFT JOIN @doorStatistics o
ON o.id = x.out_id
ORDER BY "user", "date", "inHour"
To test the handling of missing rows, simply comment out some of your test data INSERT statements.
source to share
You need to select a minimum OUT record for each IN record for a given user, after ensuring that there is no intermediate IN record (which would correspond to someone who received IN twice without leaving the building). This requires some modestly complex SQL (like the NOT EXISTS clause). This way you will have a separate join on the table, plus an additional NOT EXISTS query on the same table. Just make sure you are referencing all table references anyway.
source to share