Is this shuffling algorithm correct?

Question

Is this shuffling algorithm correct?

Below is the shuffle algorithm implemented in ruby:

def shuffle03!(arr)
    len = arr.length
    for i in 0..len-1
        index1 = Random.rand(0..len-1)
        index2 = Random.rand(0..len-1)
        arr[index1], arr[index2] = arr[index2], arr[index1]
    end
end

I tested this algorithm by counting:

class ShuffleTest
    def initialize(seed)
        len = seed.length
        @count = {}
        for i in 0..len-1
            @count[seed[i]] = Array.new(len, 0)
        end
    end
    def test(arr)
        for i in 0...arr.length
            @count[arr[i]][i] += 1
        end
    end
    def show_count
        return @count
    end
end


def shuffle03!(arr)
    len = arr.length
    for i in 0..len-1
        index1 = Random.rand(0..len-1)
        index2 = Random.rand(0..len-1)
        arr[index1], arr[index2] = arr[index2], arr[index1]
    end
end


arr = ['a', 'b', 'c', 'd']

st = ShuffleTest.new(arr)

for x in 0..100_0000
    shuffle03!(arr)
    st.test(arr)
end

st.show_count.each do |k, v|
    puts k
    p v
end

result:

a
[250418, 249105, 249553, 250925]
b
[249372, 250373, 250785, 249471]
c
[250519, 250097, 249369, 250016]
d
[249692, 250426, 250294, 249589]

This semms will be correct. However, I don't know how to prove it using mathematical statistics. So I'm not sure if this is correct.

+3

algorithm ruby

letiantian 24 Aug '14 at 2:18

source to share

2 answers

I would like to suggest a Ruby way to achieve your goal.

Obviously you cannot use Array # shuffle , but (luckily!) You can use Kernel # rand . (I am assuming you cannot use Array # sample since: arr.sample(arr.size)

has the same effect as arr.shuffle

.)

There are many ways to implement shuffle

that are statistically significant (on the assumption that it rand(n)

creates truly random numbers between 0

and n-1

, which of course is impossible, but that's a reasonable guess). Here's one way:

class Array
  def shuffle
    arr = self.dup
    map { arr.delete_at(rand(arr.size)) }
  end
end

Try:

arr = [4,:a,5,6,'b',7,8]

arr.shuffle #=> [6,   8, "b", 5,   4, :a,   7]
arr.shuffle #=> [5,  :a,   8, 4, "b",  7,   6]
arr.shuffle #=> [6,   8,   5, 7, "b", :a,   4]
arr.shuffle #=> [6,   4,   7, 8,   5, :a, "b"]
arr.shuffle #=> [:a,  4, "b", 5,   7,  8,   6]
arr.shuffle #=> ["b", 4,   7, 8,  :a,  6,   5]

+2

Cary swoveland 24 Aug 14 at 3:54

source to share

ruakh · Accepted Answer · 2014-08-24T02:38:20+0000

No, it’s wrong.

Imagine you have a four-element list, [A, B, C, D]. Note that:

There are 4! = 24 possible permutations. For this to be a correct shuffle algorithm, each of these permutations must be equally likely.
You are generating 4x; 2 = 8 random numbers, each in the range 0 - 3, for a total of 4 ⁸ = 65,536 possible random number sequences. Each of these sequences is equally likely.
65,536 is not divisible by 24, so there is no way your algorithm can map 65,536 possible sequences of random numbers into permutations in such a way as to assign an equal number of sequences of random numbers (and therefore equal probability) to each permutation.

To see this in a test, you can create a variation of your own shuffle03!

, which instead of using a random generator, takes a list of eight indices and uses them. ( shuffle03!

can be implemented by generating eight random indices and then calling that variant as a helper function.) Then your test will iterate over all 4096 possible sequences, and for each, create a list of four items [A, B, C, D]. and then call the variant method to see the resulting permutation. The test can count how often each permutation appears and use that to find which permutations appear more times than others. You will find the following:

 Permutation    # of Occurrences
-------------  ------------------
 A B C D                    4480
 A B D C                    3072
 A C B D                    3072
 A C D B                    2880
 A D B C                    2880
 A D C B                    3072
 B A C D                    3072
 B A D C                    2432
 B C A D                    2880
 B C D A                    2048
 B D A C                    2048
 B D C A                    2880
 C A B D                    2880
 C A D B                    2048
 C B A D                    3072
 C B D A                    2880
 C D A B                    2432
 C D B A                    2048
 D A B C                    2048
 D A C B                    2880
 D B A C                    2880
 D B C A                    3072
 D C A B                    2048
 D C B A                    2432

As you can see, items tend to end in the same order in which they started; for example, A B C D

is the most common permutation. We can highlight one aspect of this by seeing, for each pair of elements, how often they end up in the same order as the opposite order. We find:

 Elements    Same Order    Opposite Order
----------  ------------  ----------------
 A and B          33792             31744
 A and C          34816             30720
 A and D          35840             29696
 B and C          33792             31744
 B and D          34816             30720
 C and D          33792             31744

Thus, some pairs are more likely than others to end up in the opposite order, but each pair is likely to end up in the same order as in the reverse order.

You can reduce the imbalance by doing more passes, but since no force 8 is divisible by 24, it will be impossible to make all permutations equally likely.

By the way, if your actual goal is a good shuffle algorithm (and not just a learning experience of computing one for yourself), then you should use Fisher-Yates shuffles .

Of course, since you are using Ruby, you can work around the whole problem simply by using Array.shuffle!

which Shuffle Fisher-Yates does for you.

Is this shuffling algorithm correct?

More articles: