Grouping records in one temporary table

I have a table where one column has duplicate records, but the other columns are different. so something like this

SubCode version status for code

1234 D1 1 A

1234 D1 0 P

1234 DA 1 A

1234 DB 1 P

5678 BB 1 A

5678 BB 0 P

5678 BP 1 A

5678 BJ 1 A

0987 HH 1 A

So in the above table. subcode and Version are unique values, while the code is repeated. I want to transfer records from the above table to a temporary table. The only entries I would like to pass are where ALL subcodes for the code have status "A" and I only want them in the temp table once.

So from the above example. the temporary table must have 5678 and 0987 because all subcodes relative to 5678 have status "A" and all subcodes for 0987 (it has only one) have status A. 1234 because its subcode "DB" has status "P"

Any help would be appreciated!

0


source to share


4 answers


INSERT theTempTable (Code)
SELECT t.Code
FROM   theTable t
       LEFT OUTER JOIN theTable subT ON (t.Code = subT.Code AND subT.status <> 'A')
WHERE  subT.Code IS NULL
GROUP BY t.Code

      

This should do the trick. The logic is a bit tricky, but I will do my best to explain how it is obtained.



Outer join combined with IS NULL check allows you to search for missing criteria. Combine this with the inverse of what you usually look for (in this case, status = 'A') and the query succeeds when there are no rows that don't match. This is the same as ((no lines) OR (all lines match)). Since we know there are rows due to another query in the table, all rows must match.

-1


source


It is a little unclear if the version column comes into play. For example, you just want to consider the lines with the highest version, or if any subcode has an "A", if it counts. Take 5678, BB, for example, where version 1 has "A" and version 0 has "B". Is 5678 included because at least one of the BB subcodes has "A", or is it because version 1 has "A".

The following code assumes you want all codes where each subcode has at least one "A", regardless of version.

SELECT
    T1.code,
    T1.subcode,
    T1.version,
    T1.status
FROM
    MyTable T1
WHERE
    (
      SELECT COUNT(DISTINCT subcode)
      FROM MyTable T2
      WHERE T2.code = T1.code
    ) =
    (
      SELECT COUNT(DISTINCT subcode)
      FROM MyTable T3
      WHERE T3.code = T1.code AND T3.status = 'A'
    )

      

Performance can be terrible if your table is large. I will try to come up with a query that will probably have better performance as it was not in my head.



Also, if you explain the entirety of your problem, maybe we can find a way to get rid of this temp table ...;)

Here are two more possible methods. There are still many subqueries, but they look like they will perform better than the method above. They are both very similar, although the second one here had a better query plan in my DB. Of course with limited data and no indexing, which is not a big test. You should try all methods and see what works best for your database.

SELECT
    T1.code,
    T1.subcode,
    T1.version,
    T1.status
FROM
    MyTable T1
WHERE
    EXISTS
    (
        SELECT *
        FROM MyTable T2
        WHERE T2.code = T1.code
          AND T2.status = 'A'
    ) AND
    NOT EXISTS
    (
        SELECT *
        FROM MyTable T3
        LEFT OUTER JOIN MyTable T4 ON
            T4.code = T3.code AND
            T4.subcode = T3.subcode AND
            T4.status = 'A'
        WHERE T3.code = T1.code
          AND T3.status <> 'A'
          AND T4.code IS NULL
    )

SELECT
    T1.code,
    T1.subcode,
    T1.version,
    T1.status
FROM
    MyTable T1
WHERE
    EXISTS
    (
        SELECT *
        FROM MyTable T2
        WHERE T2.code = T1.code
          AND T2.status = 'A'
    ) AND
    NOT EXISTS
    (
        SELECT *
        FROM MyTable T3
        WHERE T3.code = T1.code
          AND T3.status <> 'A'
          AND NOT EXISTS
            (
                SELECT *
                FROM MyTable T4
                WHERE T4.code = T3.code
                  AND T4.subcode = T3.subcode
                  AND T4.status = 'A'
            )
    )

      

+1


source


Here's my solution

SELECT Code
FROM
(
  SELECT
    Code,
    COUNT(SubCode) as SubCodeCount
    SUM(CASE WHEN ACount > 0 THEN 1 ELSE 0 END)
      as SubCodeCountWithA
  FROM
  (
    SELECT
      Code,
      SubCode,
      SUM(CASE WHEN Status = 'A' THEN 1 ELSE 0 END)
        as ACount
    FROM CodeTable
    GROUP BY Code, SubCode
  ) sub
  GROUP BY Code
) sub2
WHERE SubCodeCountWithA = SubCodeCount

      

Let me break it from the inside out.

    SELECT
      Code,
      SubCode,
      SUM(CASE WHEN Status = 'A' THEN 1 ELSE 0 END)
        as ACount
    FROM CodeTable
    GROUP BY Code, SubCode

      

Group codes and subcodes (each line is a separate pairing of code and subcode). See how many A's happened in each pairing.

  SELECT
    Code,
    COUNT(SubCode) as SubCodeCount
    SUM(CASE WHEN ACount > 0 THEN 1 ELSE 0 END)
      as SubCodeCountWithA
  FROM
    --previous
  GROUP BY Code

      

Rearrange these pairs with code (now each line is a code) and counts the number of subcodes and the number of subcodes of A.

SELECT Code
FROM
  --previous
WHERE SubCodeCountWithA = SubCodeCount

      

Extract these codes with the same number of subcodes as the subcodes with A characters.

+1


source


In your selection, add a where clause that says:

Select [stuff]
From Table T
Where Exists
    (Select * From Table 
     Where Code = T.Code
        And Status = 'A')
  And Not Exists
    (Select * From Table I
     Where Code = T.Code 
       And Not Exists
          (Select * From Table
           Where Code = I.Code
               And SubCode = I.SubCode
               And Status = 'A'))

      

In English, Show me lines where there is at least one line with status "A" and there are NO lines with any particular subcode that do not have at least one line with this code / subcode with status "A"

0


source







All Articles