Stuck with optimizing boolean SQL queries in relational algebra (OR in WHERE)
I am stuck optimizing this SQL query in relational algebra:
SELECT * FROM R1, R2, R3, R4
WHERE (R1.A = '1' OR (R2.B = '2' AND R3.C = R4.C)) AND R4.D = '4'
I translated it to the following Relational Algebra statement:
σ{R1.A='1' ∨ (R2.B='2' ∧ R3.C=R4.C) ∧ R4.D='4'}(R1 × R2 × R3 × R4)
My problem is that I don't know how to optimize the where clause. I know that I can convert the last condition to σ{R4.D='4'}(R4)
and bring it down the tree directly to R4. There are certain optimization rules, however I really don't know how to handle OR. Boolean query optimization rules
But how do I optimize the rest of the where? I was thinking about using a distributive rule to convert it to KNF,
(R1.A='1' ∨ R2.B='2') ∧ (R1.A='1' ∨ R3.C=R4.C)
which would allow me to handle both sentences myself. But I won't go on, especially in what order should I join or make cartesian production.
Here is the tree operator I am drawing:
source to share
A good way to deal with disjunctions during query optimization is to convert the selection condition to disjunctive normal form (DNF) and then rewrite the selection in the Selection Union (one per difference).
those. apply rule # 2 here: https://en.wikipedia.org/wiki/Relational_algebra#Breaking_up_selections_with_complex_conditions
Like most query optimization tricks, it works well in some cases and not in others - which is why SQL optimizers are looking for plan space while trying to come up with a decent one.
source to share