Two-way merge sort of multiple paths
This is a question from Database Systems The Complete Book, Second Edition - Chapter 15: Two-Pass Sort-Based Algorithm. "Sometimes it is possible to save some disk I / O if we leave the last sublist in memory. It might make sense to use a sublist of fewer blocks to take advantage of this effect. How many disk I / O will be saved this way?"
I figured you would split the original relation into sub-lists and sort them in the first pass, and keep the last list in memory, which will take less than block M-1. Then how do you progress with sorting?
source to share
This is just a guess, but I suspect the answer can be summarized as follows. The standard merge sort "in order" looks like this:
1 1 1 1 1 1 1 1
--- --- --- --- -- pass 1
2 2 2 2
----- ----- -- pass 2
4 4
--------- -- pass 3
8
Note that we are doing a full pass of the input before moving on to the next level.
An alternative is a "subtree at a point in time" merge sort, which looks like this:
1 1 1 1 1 1 1 1
--- | | | | | |
2 --- | | | |
| 2 | | | |
----- | | | |
4 --- | |
| 2 ---
| | 2
| -----
| 4
---------
8
Here, we merge each subtree with its neighbor of the same height as soon as that neighbor is built. We are doing the same job, but the terrain is improving.
Greetings.
source to share