Two-way merge sort of multiple paths
This is a question from Database Systems The Complete Book, Second Edition - Chapter 15: Two-Pass Sort-Based Algorithm. "Sometimes it is possible to save some disk I / O if we leave the last sublist in memory. It might make sense to use a sublist of fewer blocks to take advantage of this effect. How many disk I / O will be saved this way?"
I figured you would split the original relation into sub-lists and sort them in the first pass, and keep the last list in memory, which will take less than block M-1. Then how do you progress with sorting?
This is just a guess, but I suspect the answer can be summarized as follows. The standard merge sort "in order" looks like this:
1 1 1 1 1 1 1 1
--- --- --- --- -- pass 1
2 2 2 2
----- ----- -- pass 2
4 4
--------- -- pass 3
8
Note that we are doing a full pass of the input before moving on to the next level.
An alternative is a "subtree at a point in time" merge sort, which looks like this:
1 1 1 1 1 1 1 1
--- | | | | | |
2 --- | | | |
| 2 | | | |
----- | | | |
4 --- | |
| 2 ---
| | 2
| -----
| 4
---------
8
Here, we merge each subtree with its neighbor of the same height as soon as that neighbor is built. We are doing the same job, but the terrain is improving.
Greetings.