F # Merge Sort - IndexOutOfRangeException when trying to implement a structure match

The general code for merge sort looks something like this:

   let remove array = 
    Array.sub array 1 (array.Length - 1)

let rec merge (chunkA : int[]) (chunkB : int[]) =
    if chunkA.Length = 0 && chunkB.Length = 0 then [||]
    else if chunkA.Length = 0 || chunkB.[0] < chunkA.[0] then Array.append [| chunkB.[0] |] (merge chunkA (remove chunkB))
    else Array.append [| chunkA.[0] |] (merge (remove chunkA) chunkB)

let rec mergesort (array : int[]) =
    let middle = array.Length / 2
    let chunkA = match middle with
                    | 1 -> [| array.[0] |]
                    | _ -> mergesort  [| for i in 0 .. middle - 1 -> array.[i]|]
    let chunkB = match array.Length - middle with
                    | 1 -> [| array.[array.Length - 1] |]
                    | _ -> mergesort [| for i in middle .. array.Length - 1 -> array.[i]|]
    merge chunkA chunkB

      

This code worked fine, but I wanted to change the series of if statements in the function merge

to a statement match with

.

Then I tried to implement the following code:

let rec merge (chunkA : int[]) (chunkB : int[]) =
match chunkA.Length with
| 0 when chunkA.Length = chunkB.Length -> [||]
| 0 | _ when chunkB.[0] < chunkA.[0] -> Array.append [| chunkB.[0] |] (merge chunkA (remove chunkB))
| _ -> Array.append [| chunkA.[0] |] (merge (remove chunkA) chunkB)

      

When I ran my code, Visual Studio threw an "IndexOutOfRangeException" at me, specifically here:

| 0 when chunkA.Length = chunkB.Length -> [||]

      

In this case, it chunkA

was empty, but chunkB

had one number in it. So I'm not entirely sure why F # even tried to return this case, since the lengths of chunks A and B were not the same, but I'm also confused as to why this would result in an index error, specifically on an empty array.

Also, I'm pretty new to F # and functional programming in general. If the structure or methodology in my code is not up to par, please feel free to comment on this.

Also, if I am fat, please do not hesitate to tell me about it.

Thanks a lot, Luke

+3


source to share


1 answer


As Fedor Soikin pointed out, the source of your exception is this line:

| 0 | _ when chunkB.[0] < chunkA.[0] -> Array.append [| chunkB.[0] |] (merge chunkA (remove chunkB))

      

But it may not be obvious to you why this is an exception. As I learned last week to my surprise , the when

match clause applies to all cases from the previous one ->

. In other words, when you write the line above, F # understands that you want the clause to when

apply to either case 0

or case _

. (This is, of course, redundant.)

And what's the reason for your exception: F # sees the case 0

, but still applies the test when chunkB.[0] < chunkA.[0]

- and since it's chunkA

empty, it will always throw an exception. To fix this, you'll have to separate the two cases, so that when

only applies to the case you mean to apply it:

| 0 -> Array.append [| chunkB.[0] |] (merge chunkA (remove chunkB))
| _ when chunkB.[0] < chunkA.[0] -> Array.append [| chunkB.[0] |] (merge chunkA (remove chunkB))

      

Unfortunately, this means some code duplication. It doesn't really matter in this case, since the duplicated code is a one-liner, but if you have a significant piece of code that ends up being duplicated due to the need to separate two such cases (where two cases shouldn't be sharing a when

), then you can turn this the duplicated chunk into a function so that it is no longer duplicated.

Edit: I just noticed a section of your code that might be simpler. Your source code:

let chunkA = match middle with
                | 1 -> [| array.[0] |]
                | _ -> mergesort  [| for i in 0 .. middle - 1 -> array.[i]|]
let chunkB = match array.Length - middle with
                | 1 -> [| array.[array.Length - 1] |]
                | _ -> mergesort [| for i in middle .. array.Length - 1 -> array.[i]|]

      

What you're doing here is to take a slice of an array, but F # has a very convenient syntax for slice arrays: array.[start..end]

where start

and end

are the inclusive indices of the desired slice. Thus, the expression [| for i in 0 .. middle - 1 -> array.[i]|]

can be simplified to array.[0 .. middle - 1]

, and the expression [| for i in middle .. array.Length - 1 -> array.[i]|]

can be simplified to array.[middle .. array.Length - 1]

. Let's replace these expressions in your code and see what we get:



let chunkA = match middle with
                | 1 -> [| array.[0] |]
                | _ -> mergesort  array.[0 .. middle - 1]
let chunkB = match array.Length - middle with
                | 1 -> [| array.[array.Length - 1] |]
                | _ -> mergesort array.[middle .. array.Length - 1]

      

Now, looking at this, I notice that the condition 1

in both cases actually deals with the same slice of the array as the condition _

; The only difference is that if the mean is 1, you don't call mergesort

. How do I know this is the same slice of the array? Well, if middle

equal to 1, the expression array.[0 .. middle-1]

becomes array.[0..0]

, which is a slice of length 1 in the array, starting at index 0, exactly equivalent [| array.[0] |]

. And if there is array.Length

exactly 1 more than middle

, then the expression array.[middle .. array.Length - 1]

will be array.[middle .. middle]

, which is exactly equivalent [| array.[middle] |]

.

So, if it weren't for the call mergesort

, we could combine these two expressions. And there's a pretty easy way to do it, really! Just move the length check to the beginning mergesort

, for example:

let rec mergesort (array : int[]) =
    if array.Length < 2 then
        array  // Arrays of length 0 or 1 are already sorted
    else
        // rest of mergesort function goes here

      

And now you can safely combine the two cases of yours match

, knowing you won't end up in an infinite recursion loop:

let middle = array.Length / 2
let chunkA = mergesort array.[0 .. middle - 1]
let chunkB = mergesort array.[middle .. array.Length - 1]
merge chunkA chunkB

      

Putting it all together, my suggested version of your original function mergesort

looks like this:

let rec mergesort (array : int[]) =
    if array.Length < 2 then
        array  // Arrays of length 0 or 1 are already sorted
    else
        let middle = array.Length / 2
        let chunkA = mergesort array.[0 .. middle - 1]
        let chunkB = mergesort array.[middle .. array.Length - 1]
        merge chunkA chunkB

      

As a bonus, this version mergesort

doesn't have a subtle bug that your original version had: you forgot to consider the empty array case. Calling the original mergesort

on an empty array will create an infinite loop. You will probably benefit more from working on yourself than from me explaining how to do this, so I just mentioned that in F # for i in 0 .. -1

it is not a bug, but will go through zero loop time for

(i.e. the body of the loop for

will not performed). Likewise is array.[0..-1]

not an error, but creates an empty array. Armed with the knowledge of this detail, you will be able to work with the code of your original function mergesort

and see that it will loop infinitely if you passed it an empty string. (Although since your callmergesort

in this infinite loop is not in the tail position, it will not be a tail call. And so the stack will eventually overflow, saving you a "true" infinite loop).

+3


source







All Articles