Is this shuffling algorithm correct?
Below is the shuffle algorithm implemented in ruby:
def shuffle03!(arr)
len = arr.length
for i in 0..len-1
index1 = Random.rand(0..len-1)
index2 = Random.rand(0..len-1)
arr[index1], arr[index2] = arr[index2], arr[index1]
end
end
I tested this algorithm by counting:
class ShuffleTest
def initialize(seed)
len = seed.length
@count = {}
for i in 0..len-1
@count[seed[i]] = Array.new(len, 0)
end
end
def test(arr)
for i in 0...arr.length
@count[arr[i]][i] += 1
end
end
def show_count
return @count
end
end
def shuffle03!(arr)
len = arr.length
for i in 0..len-1
index1 = Random.rand(0..len-1)
index2 = Random.rand(0..len-1)
arr[index1], arr[index2] = arr[index2], arr[index1]
end
end
arr = ['a', 'b', 'c', 'd']
st = ShuffleTest.new(arr)
for x in 0..100_0000
shuffle03!(arr)
st.test(arr)
end
st.show_count.each do |k, v|
puts k
p v
end
result:
a
[250418, 249105, 249553, 250925]
b
[249372, 250373, 250785, 249471]
c
[250519, 250097, 249369, 250016]
d
[249692, 250426, 250294, 249589]
This semms will be correct. However, I don't know how to prove it using mathematical statistics. So I'm not sure if this is correct.
source to share
No, itβs wrong.
Imagine you have a four-element list, [A, B, C, D]. Note that:
- There are 4! = 24 possible permutations. For this to be a correct shuffle algorithm, each of these permutations must be equally likely.
- You are generating 4x; 2 = 8 random numbers, each in the range 0 - 3, for a total of 4 8 = 65,536 possible random number sequences. Each of these sequences is equally likely.
- 65,536 is not divisible by 24, so there is no way your algorithm can map 65,536 possible sequences of random numbers into permutations in such a way as to assign an equal number of sequences of random numbers (and therefore equal probability) to each permutation.
To see this in a test, you can create a variation of your own shuffle03!
, which instead of using a random generator, takes a list of eight indices and uses them. ( shuffle03!
can be implemented by generating eight random indices and then calling that variant as a helper function.) Then your test will iterate over all 4096 possible sequences, and for each, create a list of four items [A, B, C, D]. and then call the variant method to see the resulting permutation. The test can count how often each permutation appears and use that to find which permutations appear more times than others. You will find the following:
Permutation # of Occurrences
------------- ------------------
A B C D 4480
A B D C 3072
A C B D 3072
A C D B 2880
A D B C 2880
A D C B 3072
B A C D 3072
B A D C 2432
B C A D 2880
B C D A 2048
B D A C 2048
B D C A 2880
C A B D 2880
C A D B 2048
C B A D 3072
C B D A 2880
C D A B 2432
C D B A 2048
D A B C 2048
D A C B 2880
D B A C 2880
D B C A 3072
D C A B 2048
D C B A 2432
As you can see, items tend to end in the same order in which they started; for example, A B C D
is the most common permutation. We can highlight one aspect of this by seeing, for each pair of elements, how often they end up in the same order as the opposite order. We find:
Elements Same Order Opposite Order
---------- ------------ ----------------
A and B 33792 31744
A and C 34816 30720
A and D 35840 29696
B and C 33792 31744
B and D 34816 30720
C and D 33792 31744
Thus, some pairs are more likely than others to end up in the opposite order, but each pair is likely to end up in the same order as in the reverse order.
You can reduce the imbalance by doing more passes, but since no force 8 is divisible by 24, it will be impossible to make all permutations equally likely.
By the way, if your actual goal is a good shuffle algorithm (and not just a learning experience of computing one for yourself), then you should use Fisher-Yates shuffles .
Of course, since you are using Ruby, you can work around the whole problem simply by using Array.shuffle!
which Shuffle Fisher-Yates does for you.
source to share
I would like to suggest a Ruby way to achieve your goal.
Obviously you cannot use Array # shuffle , but (luckily!) You can use Kernel # rand . (I am assuming you cannot use Array # sample since: arr.sample(arr.size)
has the same effect as arr.shuffle
.)
There are many ways to implement shuffle
that are statistically significant (on the assumption that it rand(n)
creates truly random numbers between 0
and n-1
, which of course is impossible, but that's a reasonable guess). Here's one way:
class Array
def shuffle
arr = self.dup
map { arr.delete_at(rand(arr.size)) }
end
end
Try:
arr = [4,:a,5,6,'b',7,8]
arr.shuffle #=> [6, 8, "b", 5, 4, :a, 7]
arr.shuffle #=> [5, :a, 8, 4, "b", 7, 6]
arr.shuffle #=> [6, 8, 5, 7, "b", :a, 4]
arr.shuffle #=> [6, 4, 7, 8, 5, :a, "b"]
arr.shuffle #=> [:a, 4, "b", 5, 7, 8, 6]
arr.shuffle #=> ["b", 4, 7, 8, :a, 6, 5]
source to share