Collect a concatenated set of multiple tables related to the same (super) table
I came up with an OO-like design for my database tables with a "super-table" that contains the columns that are in all of my shared tables, each of the "helper tables" using the rowid ptr for the super table.
Like this:
CREATE TABLE 'SuperTable' (
id INTEGER PRIMARY KEY AUTOINCREMENT,
created DATETIME
);
CREATE TABLE 'SubTable1' (
id INTEGER PRIMARY KEY AUTOINCREMENT,
super_id INTEGER, -- reference to SuperTable
additionalData TEXT
);
CREATE TABLE 'SubTable2' (
id INTEGER PRIMARY KEY AUTOINCREMENT,
super_id INTEGER, -- reference to SuperTable
moreData BLOB
);
For every record in any "auxiliary table" in the SuperTable, there is exactly one corresponding record and vice versa.
Now, I like to query all sub tables, giving me a row for each record in SuperTable
, with associated data in the appropriate subcategory.
I came up with this:
SELECT * FROM SuperTable
LEFT OUTER JOIN SubTable1 ON SubTable1.super_id = SuperTable.id
LEFT OUTER JOIN SubTable2 ON SubTable2.super_id = SuperTable.id
WHERE
SubTable1.super_id IS NOT NULL OR
SubTable2.super_id IS NOT NULL
I saw that without the part WHERE
I would have gotten quite a few rows where both sub-tables were NULL
- thanks to the modifier OUTER JOIN
, because it is SuperTable
also used by other sub-categories.I did not include this query.
Here's an example of output without a WHERE clause:
id created id super_id additionalData id super_id moreData
---------- ---------- ---------- ---------- -------------- ---------- ---------- ----------
1 a
2 b 1 2 more of 1
3 c
4 d 3 4 additional 3
5 e 2 5 more of 2
Rows 1 and 3 above are empty and should be removed from the results I am currently achieving with the suggestion WHERE
.
I wonder if there is a better way to select rows for selected subcategories. For example. the one that doesn't end first, collecting all the rows from SuperTable
and only then sorting those that were not in the combined table.
I'm using SQLite at the moment, but a more general answer would be appreciated as well.
BTW, here's the test database I'm using with the examples above: SO_ 30595895.sqlite
source to share
There are two ways to avoid duplication (caused by FKs in that they are not unique): 1) there is:
SELECT s.*
FROM supertable s
WHERE EXISTS ( SELECT 1 FROM subtable1 x
WHERE x.super_id = s.id)
OR EXISTS ( SELECT 1 FROM subtable2 x
WHERE x.super_id = s.id)
-- OR EXISTS ...
Or, 2) first concatenate the subtext FKs and concatenate the result with supertable:
SELECT s.*
FROM supertable s
JOIN ( SELECT DISTINCT super_id AS id
FROM subtable1
UNION
SELECT DISTINCT super_id AS id
FROM subtable2
-- union ...
) x ON x.id = s.id
;
UPDATE. 3) if you also want a (boolean) indicator for existence in any of the sub-tables, you can use exists () on a scalar subquery:
SELECT s.*
, (EXISTS ( SELECT 1 FROM subtable1 x
WHERE x.super_id = s.id)) AS exists_in_1
, (EXISTS ( SELECT 1 FROM subtable2 x
WHERE x.super_id = s.id)) AS exists_in_2
-- , ...
FROM supertable s
source to share
I have to clean this up when you have a dataset like this:
[SuperTable] [SubTable1] [SubTable2]
ID ID | stID ID | stID
---- ---+------- ---+-------
1 1 | 1 1 | 2
2 2 | 1 2 | 2
the result of using multi LEFT JOIN
is as follows:
ID | ID | sID | ID | sID
----+-------+-------+-------+-------
1 | 1 | 1 | NULL | NULL
1 | 2 | 1 | NULL | NULL
2 | NULL | NULL | 1 | 2
2 | NULL | NULL | 2 | 2
Therefore, I suggest you use this query:
SELECT s.*, SubTable1.*, SubTable2.*
FROM SuperTable s
LEFT OUTER JOIN
(SELECT MIN(id) id, super_id
FROM SubTable1
GROUP BY super_id) s1
JOIN SubTable1 ON s1.id = SubTable1.id ON s1.super_id = s.id
LEFT OUTER JOIN
(SELECT MIN(id) id, super_id
FROM SubTable2
GROUP BY super_id) s2
JOIN SubTable2 ON s2.id = SubTable2.id ON s2.super_id = s.id
WHERE
COALESCE(s1.super_id, s2.super_id, -2) <> -2
source to share