Stuck with optimizing boolean SQL queries in relational algebra (OR in WHERE)

I am stuck optimizing this SQL query in relational algebra:

SELECT * FROM R1, R2, R3, R4 
WHERE (R1.A = '1' OR (R2.B = '2' AND R3.C = R4.C)) AND R4.D = '4'

      

I translated it to the following Relational Algebra statement:

σ{R1.A='1'  ∨ (R2.B='2'  ∧  R3.C=R4.C) ∧ R4.D='4'}(R1 × R2 × R3 × R4)

      

My problem is that I don't know how to optimize the where clause. I know that I can convert the last condition to σ{R4.D='4'}(R4)

and bring it down the tree directly to R4. There are certain optimization rules, however I really don't know how to handle OR. Boolean query optimization rules

But how do I optimize the rest of the where? I was thinking about using a distributive rule to convert it to KNF,

(R1.A='1' ∨ R2.B='2')  ∧ (R1.A='1'  ∨  R3.C=R4.C) 

      

which would allow me to handle both sentences myself. But I won't go on, especially in what order should I join or make cartesian production.

Here is the tree operator I am drawing: OperatorTree

+3


source to share


2 answers


A union was not possible because it would need the same type of speakers. Now I have received an official decision from my mentor. As I thought, it was necessary to transform it with a distribution rule in order to convert it to KNF, so I have two separate sentences.



Solution for the problem

+1


source


A good way to deal with disjunctions during query optimization is to convert the selection condition to disjunctive normal form (DNF) and then rewrite the selection in the Selection Union (one per difference).

those. apply rule # 2 here: https://en.wikipedia.org/wiki/Relational_algebra#Breaking_up_selections_with_complex_conditions



Like most query optimization tricks, it works well in some cases and not in others - which is why SQL optimizers are looking for plan space while trying to come up with a decent one.

+1


source







All Articles