Is there a way to sort the split arrays?

I have three arrays of integers ( arr1

, arr2

, arr3

), stored in three different databases ( db1.table

, db2.table

, db3.table

). I cannot retrieve all the arrays from the tables because they are too large to be stored in memory.

I need to get all sorted data from an entire array (concatenation of arrays), sorted in ascending order from 10*n

to 10*(n+1)

indices where n >= 0

. I cannot get all the arrays, find their union and sort it. But I can get sorted data in ascending order from 10*n

to 10*(n+1)

for any of these arrays.

arr1        arr2           arr3
 2           5              2
 43          2              234
 234         2              235
 23          5              234
 2           10             46
 1           17             456
 423        233             6
 2          422             46
 54         23              345
 ...............................

      

Is there a better algorithm for this than a silly all-array search?

+3


source to share


1 answer


Do cross-database merging, most engines will allow you to do this if the databases are in the same node or cluster.

mysql> select * from test.nums;
+------+
| num  |
+------+
|    1 |
|    3 |
|    5 |
|    7 |
|    9 |
+------+
mysql> select * from test2.nums;
+------+
| num  |
+------+
|    2 |
|    4 |
|    6 |
|    8 |
+------+
mysql> select * from test.nums union select * from test2.nums order by num limit 3, 6;
+------+
| num  |
+------+
|    4 |
|    5 |
|    6 |
|    7 |
|    8 |
|    9 |
+------+

      

If for some reason this doesn't work for you, the way to do it in code is to use cursors and three-way merge.



Assuming you can do "SELECT num from test.nums ORDER BY num" and another "SELECT num from test2.nums ORDER BY NUM" (and a third time), then you would have three cursors to iterate over the result set in your code. Now it depends on the implementation of the driver connecting to the database, but the cursor usually works with a sliding window to manage memory for you. This means that you can transparently iterate over the entire recordset, but only portions of it at a time in memory.

Once you have three cursors, start repeating the sorting of the lists of items one at a time. If you remember the mergesort algorithm, this is similar to the merge step, with only three lists instead of two.

Another thing you need to do while iterating over cursors is to simply ignore every element you would add to your sorted merged array up to position x * 100, start adding there and break your loop when you reach (x +1) * 100

+2


source







All Articles