Intersection Complement in SQL
I am using Oracle SQL and I have a basic question regarding the command join
.
I have 5 tables. Each of them has the same column as the primary key: ID (int)
. Let's take a look at the following queries:
select count(*) from table_a - 100 records
select count(*) from table_c - 200 records
select count(*) from table_c - 150 records
select count(*) from table_d - 100 records
select count(*) from table_e - 120 records
select * -- 88 records
from table_a a
inner join table b
on a.id = b.id
inner join table c
on a.id = c.id
inner join table d
on a.id = d.id
inner join table e
on a.id = e.id
In this case, many records are not included in the output if one of the tables does not include a specific identifier (even if the rest of them are included). How can I find out what these "bad" records are? This is actually an addition to the intersection that I think.
I want to know what are the problem records and tables of each case. For example: ID 123 is a bad record because it is not included in table_c, but is included in other tables. ID 321 is problematic because it is included in all tables except table_d.
source to share
You are probably looking for a symmetrical difference between all of your tables.
To solve this problem without being too smart, you will need FULL OUTER JOIN ... USING
:
SELECT id
FROM table_a
FULL OUTER JOIN table_b USING(id)
FULL OUTER JOIN table_c USING(id)
FULL OUTER JOIN table_d USING(id)
FULL OUTER JOIN table_e USING(id)
WHERE table_a.ROWID IS NULL
OR table_b.ROWID IS NULL
OR table_c.ROWID IS NULL
OR table_d.ROWID IS NULL
OR table_e.ROWID IS NULL;
FULL OUTER JOIN
will return all rows that satisfy the concatenation condition (as normal JOIN
), as well as all rows without matching rows. The clause USING
inserts an implicit one COALESCE
into the equijoin column.
Another option is to use anti-join :
SELECT id
FROM table_a
FULL OUTER JOIN table_b USING(id)
FULL OUTER JOIN table_c USING(id)
FULL OUTER JOIN table_d USING(id)
FULL OUTER JOIN table_e USING(id)
WHERE id NOT IN (
SELECT id
FROM table_a
INNER JOIN table_b USING(id)
INNER JOIN table_c USING(id)
INNER JOIN table_d USING(id)
INNER JOIN table_e USING(id)
)
Basically, this will lead to the union of all sets minus the intersection of all sets.
Graphically, you can compare INNER JOIN
and OUTER JOIN
(on 3 tables just for presentation convenience):
Given this test case:
ID TABLE_A TABLE_B TABLE_C TABLE_D TABLE_E 1 * - - - - 2 - * * * * 3 * - - * - 4 * * * * *
*
-
no entry in the table
Both requests will return:
ID
1
3
2
If you want a tabular result, you can adapt one of these queries by adding a bunch of expressions CASE
. Something like that:
SELECT ID,
CASE when table_a.rowid is not null then 1 else 0 END table_a,
CASE when table_b.rowid is not null then 1 else 0 END table_b,
CASE when table_c.rowid is not null then 1 else 0 END table_c,
CASE when table_d.rowid is not null then 1 else 0 END table_d,
CASE when table_e.rowid is not null then 1 else 0 END table_e
FROM table_a
FULL OUTER JOIN table_b USING(id)
FULL OUTER JOIN table_c USING(id)
FULL OUTER JOIN table_d USING(id)
FULL OUTER JOIN table_e USING(id)
WHERE table_a.ROWID IS NULL
OR table_b.ROWID IS NULL
OR table_c.ROWID IS NULL
OR table_d.ROWID IS NULL
OR table_e.ROWID IS NULL;
Production:
ID TABLE_A TABLE_B TABLE_C TABLE_D TABLE_E 1 1 0 0 0 0 3 1 0 0 1 0 2 0 1 1 1 1
1
0
no entry in the table
source to share
Try the following:
SELECT id FROM (
SELECT id FROM table_a
UNION
SELECT id FROM table_b
UNION
SELECT id FROM table_c
UNION
SELECT id FROM table_d
UNION
SELECT id FROM table_e
) result
WHERE id NOT IN ( select a.id from table_a a
inner join table_b b
on a.id = b.id
inner join table_c c
on a.id = c.id
inner join table_d d
on a.id = d.id
inner join table_e e
on a.id = e.id )
source to share
If you understand correctly, you can use outer joins to determine which rows do not have matching primary (or unique) keys. For example, use a left join to find inconsistent rows in table b in the following example:
select a.id from a left join b on a.id=b.id where b.id is null
conversely, to find inconsistent rows in table a:
select b.id from a right join b on a.id=b.id where a.id is null
source to share
This solution will tell you which tables do not have rows for each ID
:
SELECT *
FROM (SELECT id, 'table_a' AS table_name FROM table_a
UNION ALL
SELECT id, 'table_b' FROM table_b
UNION ALL
SELECT id, 'table_c' FROM table_c
UNION ALL
SELECT id, 'table_d' FROM table_d
UNION ALL
SELECT id, 'table_c' FROM table_e) PIVOT (COUNT (*)
FOR table_name
IN ('table_a' AS table_a,
'table_b' AS table_b,
'table_c' AS table_c,
'table_d' AS table_d,
'table_e' AS table_e))
WHERE table_a + table_b + table_c + table_d + table_e < 5
ORDER BY id
Result example:
ID | TABLE_A | TABLE_B | TABLE_C | TABLE_D | TABLE_E
0 | 1 | 0 | 0 | 1 | 0
1 | 0 | 1 | 0 | 1 | 0
2 | 1 | 1 | 0 | 0 | 0
source to share