Grouping records in one temporary table
I have a table where one column has duplicate records, but the other columns are different. so something like this
SubCode version status for code
1234 D1 1 A
1234 D1 0 P
1234 DA 1 A
1234 DB 1 P
5678 BB 1 A
5678 BB 0 P
5678 BP 1 A
5678 BJ 1 A
0987 HH 1 A
So in the above table. subcode and Version are unique values, while the code is repeated. I want to transfer records from the above table to a temporary table. The only entries I would like to pass are where ALL subcodes for the code have status "A" and I only want them in the temp table once.
So from the above example. the temporary table must have 5678 and 0987 because all subcodes relative to 5678 have status "A" and all subcodes for 0987 (it has only one) have status A. 1234 because its subcode "DB" has status "P"
Any help would be appreciated!
INSERT theTempTable (Code)
SELECT t.Code
FROM theTable t
LEFT OUTER JOIN theTable subT ON (t.Code = subT.Code AND subT.status <> 'A')
WHERE subT.Code IS NULL
GROUP BY t.Code
This should do the trick. The logic is a bit tricky, but I will do my best to explain how it is obtained.
Outer join combined with IS NULL check allows you to search for missing criteria. Combine this with the inverse of what you usually look for (in this case, status = 'A') and the query succeeds when there are no rows that don't match. This is the same as ((no lines) OR (all lines match)). Since we know there are rows due to another query in the table, all rows must match.
source to share
It is a little unclear if the version column comes into play. For example, you just want to consider the lines with the highest version, or if any subcode has an "A", if it counts. Take 5678, BB, for example, where version 1 has "A" and version 0 has "B". Is 5678 included because at least one of the BB subcodes has "A", or is it because version 1 has "A".
The following code assumes you want all codes where each subcode has at least one "A", regardless of version.
SELECT
T1.code,
T1.subcode,
T1.version,
T1.status
FROM
MyTable T1
WHERE
(
SELECT COUNT(DISTINCT subcode)
FROM MyTable T2
WHERE T2.code = T1.code
) =
(
SELECT COUNT(DISTINCT subcode)
FROM MyTable T3
WHERE T3.code = T1.code AND T3.status = 'A'
)
Performance can be terrible if your table is large. I will try to come up with a query that will probably have better performance as it was not in my head.
Also, if you explain the entirety of your problem, maybe we can find a way to get rid of this temp table ...;)
Here are two more possible methods. There are still many subqueries, but they look like they will perform better than the method above. They are both very similar, although the second one here had a better query plan in my DB. Of course with limited data and no indexing, which is not a big test. You should try all methods and see what works best for your database.
SELECT
T1.code,
T1.subcode,
T1.version,
T1.status
FROM
MyTable T1
WHERE
EXISTS
(
SELECT *
FROM MyTable T2
WHERE T2.code = T1.code
AND T2.status = 'A'
) AND
NOT EXISTS
(
SELECT *
FROM MyTable T3
LEFT OUTER JOIN MyTable T4 ON
T4.code = T3.code AND
T4.subcode = T3.subcode AND
T4.status = 'A'
WHERE T3.code = T1.code
AND T3.status <> 'A'
AND T4.code IS NULL
)
SELECT
T1.code,
T1.subcode,
T1.version,
T1.status
FROM
MyTable T1
WHERE
EXISTS
(
SELECT *
FROM MyTable T2
WHERE T2.code = T1.code
AND T2.status = 'A'
) AND
NOT EXISTS
(
SELECT *
FROM MyTable T3
WHERE T3.code = T1.code
AND T3.status <> 'A'
AND NOT EXISTS
(
SELECT *
FROM MyTable T4
WHERE T4.code = T3.code
AND T4.subcode = T3.subcode
AND T4.status = 'A'
)
)
source to share
Here's my solution
SELECT Code
FROM
(
SELECT
Code,
COUNT(SubCode) as SubCodeCount
SUM(CASE WHEN ACount > 0 THEN 1 ELSE 0 END)
as SubCodeCountWithA
FROM
(
SELECT
Code,
SubCode,
SUM(CASE WHEN Status = 'A' THEN 1 ELSE 0 END)
as ACount
FROM CodeTable
GROUP BY Code, SubCode
) sub
GROUP BY Code
) sub2
WHERE SubCodeCountWithA = SubCodeCount
Let me break it from the inside out.
SELECT
Code,
SubCode,
SUM(CASE WHEN Status = 'A' THEN 1 ELSE 0 END)
as ACount
FROM CodeTable
GROUP BY Code, SubCode
Group codes and subcodes (each line is a separate pairing of code and subcode). See how many A's happened in each pairing.
SELECT
Code,
COUNT(SubCode) as SubCodeCount
SUM(CASE WHEN ACount > 0 THEN 1 ELSE 0 END)
as SubCodeCountWithA
FROM
--previous
GROUP BY Code
Rearrange these pairs with code (now each line is a code) and counts the number of subcodes and the number of subcodes of A.
SELECT Code
FROM
--previous
WHERE SubCodeCountWithA = SubCodeCount
Extract these codes with the same number of subcodes as the subcodes with A characters.
source to share
In your selection, add a where clause that says:
Select [stuff]
From Table T
Where Exists
(Select * From Table
Where Code = T.Code
And Status = 'A')
And Not Exists
(Select * From Table I
Where Code = T.Code
And Not Exists
(Select * From Table
Where Code = I.Code
And SubCode = I.SubCode
And Status = 'A'))
In English, Show me lines where there is at least one line with status "A" and there are NO lines with any particular subcode that do not have at least one line with this code / subcode with status "A"
source to share