Efficient joining of interval ranges in SQL

Suppose I have two tables (data taken from this SO post ):

Table d1

:

 x start end
 a     1   3
 b     5  11
 c    19  22
 d    30  39
 e     7  25

      

Table d2

:

 x pos
 a   2
 a   3
 b   3
 b  12
 c  20
 d  52
 e  10

      

The first row in both tables is the column headings. I would like to extract all rows in d2

where the column x

matches d1

and pos1

ends up in (including boundary values) d1

start

and columns end

. That is, I would like to get the result:

 x pos start  end
 a   2     1    3
 a   3     1    3
 c  20    19   22
 e  10     7   25

      

The way I have done it so far is this:

SELECT * FROM d1 JOIN d2 USING (x) WHERE pos BETWEEN start AND end

      

But it's not clear to me if this operation is as efficient as it can be (i.e. optimized internally). For example, calculating the entire connection at first is not a scalable approach IMHO (in terms of speed and memory).

Are there other efficient query optimizations (ex: interval trees ) or other algorithms that can handle ranges efficiently (again, in terms of both speed and memory) in SQL that I can use? It doesn't matter if it uses SQLite, PostgreSQL, mySQL, etc.

What is the most efficient way to accomplish this operation in SQL?

Many thanks.

+3


source to share


1 answer


Not sure how it all works internally, but depending on the situation, I would advise playing with a table that "rolls" all the values ​​out of d1 and then joins it. In this way, the query engine can pinpoint the correct entry “exactly”, rather than finding a combination of bounds that match the expected value.

eg.

x value
a  1
a  2
a  3
b  5
b  6
b  7
b  8
b  9
b 10
b 11
c 19 etc..

      



by specifying the index on the value column (**), this should be quite a bit faster than joining with the start and end BETWEEN in the original table D1 IMHO.

Disable course, every time you make changes to d1 you need to adjust the expanded table as well (trigger?). If this happens frequently, you will spend more time updating the expanded table than you received in the first place! It can also take up quite a bit of (disk) space if some of the intervals are really long; and also, it assumes that we don't need to search for non-integers (for example, what if we are looking for the value 3.14?)

(You can consider experimenting with unique on (value, x) here ...)

0


source







All Articles