How can I slice the array, avoiding duplicate values ββin each chunk?
Suppose I have this array:
a = [1,2,3,3,3,3,3,3,3,3,3,4,5,6,6,7,8,9,10,11]
a.each_slice (2) .to_a will generate pairs, but these pairs will contain non-unique values ββsuch as [3,3]. So I guess I am looking for some sort of unique_each_slice method.
I want to be able to shuffle this array until I get to the point where I have unique pairs of 2 (doesn't have to be 2, can be anything), for example (using 2 example)
[3, 1, 3, 7, 6, 3, 4, 5, 8, 3, 9, 3, 2, 3, 6, 3, 3, 11, 10, 3]
If you do each_slice (2) on this array, you get unique pairs:
[[3, 1], [3, 7], [6, 3], [4, 5], [8, 3], [9, 3], [2, 3], [6, 3], [3, 11], [10, 3]]
compared to the original where you have:
[[1, 2], [3, 3], [3, 3], [3, 3], [3, 3], [3, 4], [5, 6], [6, 7], [8, 9], [10, 11]]
with non-repeating pairs in each, like [3,3]
Another example, suppose I have:
a = [1,2,3,3,3,3,3,3,3,3,3,4,5,6,6,7,8,9,10,11,12,13,14,15,16,17]
Now, suppose there is some function a.unique_slices_of (3), I would get:
[[4, 16, 3], [1, 9, 3], [3, 6, 17], [3, 6, 10], [15, 3, 2], [3, 8, 12], [11, 3, 14], [7, 13, 3], [3, 5]]
By "unique slice" I mean a slice where the same number is not repeated twice: [1,2,3] is a unique slice, [3,1,3] is not.
So far, I have come up with the following method, which seems to take several iterations before everything turns out right:
class Array
def unique_slices_of!(slices)
loop do
unique = true
self.each_slice(slices) do |slice|
if slice != slice.uniq
self.shuffle!
unique = false # so we know whether to loop again
break
end
end
break if unique # if unique didn't change, that means all slices were equal
if unique == false then unique == true end # reset and start again
end
self
end
end
The main problem with my code is that a) I don't think I'm using some idiomatic Ruby method that can cut this process in half or more. b) Possibility of an infinite loop if the array simply cannot contain unique slices. I will probably need to use some combination theory here, but I'm not sure how.
source to share
a = [1,2,3,3,3,3,3,3,3,3,3,4,5,6,6,7,8,9,10,11]
You can check if slices are "unique":
a.each_slice(2).all?{|x| x == x.uniq}
So now you just shuffle until you get what you want:
a.shuffle! until a.each_slice(2).all?{|x| x == x.uniq}
The easiest way to avoid an infinite loop is timeout
:
require 'timeout'
# raise an error if it takes more than 1 second
timeout(1){ a.shuffle! until a.each_slice(3).all?{|x| x == x.uniq} }
source to share
Unique sample combinations
If you are looking for something more idiomatic and if algorithm efficiency is not your main concern, you can try the following:
a = [1, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 5, 6, 6, 7, 8, 9, 10, 11] a.combination(2).reject { |pair| pair[0] == pair[1] }.sample(a.size / 2)
The main disadvantage of this approach is speed when a is large, because the combination of Array # will generate all possible combinations before you start down the results with Array # reject and Array # sample . However, for modest sized arrays, this certainly seems to be fast enough.
Evaluating Solution Performance
Routine testing shows that this is more than fast enough for modest-sized arrays. Consider:
require 'benchmark'
a = [1, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 5, 6, 6, 7, 8, 9, 10, 11]
Benchmark.measure do
a.combination(2).reject { |pair| pair[0] == pair[1] }.sample(a.size / 2)
end.to_s
#=> " 0.000000 0.000000 0.000000 ( 0.000052)\n"
Even with 100,000 iterations, my system still only had 3,650299 seconds. This seems fast enough for practical use given your placed enclosure, but your mileage may vary.
Allowing Arbitrary Subarray Sizing
Comparing terms with a graph
In the comments, the OP asked if this could be generalized to winnow sub-arrays with 2, 3, or 4 elements each. Yes, with a little refactoring, although performance degrades as the number of items in combination increases. Consider:
array = [1, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 5, 6, 6, 7, 8, 9, 10, 11]
element_size = 4
array.combination(element_size).
reject { |element| element.map { |member| element.count(member) > 1 }.any? }.
sample(array.size / element_size)
The desired element_size is used to determine the number of samples that are required dynamically. This has the side effect of discarding any partially filled arrays, eliminating the "saggy" elements you get with # each_slice .
The workhorse here is still the rejection method, which now iterates over each member of each submatrix using # count and rejects the elements that have # any? terms that appear more than once in this submatrix. Even with better variable names, this is a little more complicated than when we have a fixed element size, but it is definitely more flexible.
More readable (and slightly faster) comparison
With a hat tip to @pguardiario (see this linked answer ), you can shorten this a little more and make it more readable by only selecting sub-arrays where all members of the array are # uniq . For example:
array.combination(element_size).
select { |subarray| subarray == subarray.uniq }.
sample(array.size / element_size)
source to share
I have a solution that seems to work. The main idea is to distribute the elements with the maximum number of samples to the maximum possible number of fragments. Add a few shuffle
to make them seem random.
class Array
def unique_slices_of(slice_length)
buf = []
arr = []
hash = Hash.new 0
self.each {|i| hash[i] += 1}
sorted = hash.sort_by {|k, v| v}.reverse
# sorted[][0] holds the element and sorted[][1] holds the count
return nil if sorted[0][1] > ((self.length * 1.0) / slice_length).ceil
index = 0
until sorted.length.zero?
# Add element to buf and decrement count
# if count == 0, remove the entry from sorted
buf << sorted[index][0]
sorted[index][1] -= 1
if sorted[index][1] == 0
sorted.delete_at index
break if sorted.length == 0
index -= 1
end
index = (index + 1) % sorted.length
if buf.length == slice_length
arr << buf.shuffle
buf.clear
index = 0
end
end
arr << buf.shuffle if buf.length > 0
arr.shuffle
end
end
Output:
[3, 1, 3, 7, 6, 3, 4, 5, 8, 3, 9, 3, 2, 3, 6, 3, 3, 11, 10, 3].unique_slices_of(2)
#=> [[8, 3], [3, 6], [3, 5], [1, 10], [3, 4], [3, 6], [7, 3], [9, 3], [3, 11], [3, 2]]
[1,2,3,3,3,3,3,3,3,3,3,4,5,6,6,7,8,9,10,11,12,13,14,15,16,17].unique_slices_of(3)
#=> [[3, 9, 6], [6, 14, 3], [3, 2, 17], [7, 3, 16], [3, 11, 10], [15, 3, 8], [4, 3, 5], [3, 13, 12], [1, 3]]
source to share
A non-random way to do it
The idea here is to put different values ββin bins. Then, if there are leftover bins:
- Order bins by size, biggest baskets first
- Make a snippet by taking a number for each of the first
max_slice_size
boxes - Remove empty cells
Since each value within a slice is taken from a different bin, it is guaranteed that the slice will contain different values.
Code:
def slices_without_repeats(a, max_slice_size)
slices = []
bins = a.group_by { |e| e }.values
until bins.empty?
bins = bins.sort_by(&:size).reverse
slice_size = [max_slice_size, bins.size].min
slice = slice_size.times.map do |i|
bins[i].pop
end
slices << slice
bins.reject!(&:empty?)
if slice.size < max_slice_size && !bins.empty?
raise ArgumentError, "An element repeats too much"
end
end
slices
end
This algorithm does not use sheer randomness. It uses Ruby's quicksort, which is unstable and can potentially exploit randomness (as when choosing pivot points).
Using:
a = [1,2,3,3,3,3,3,3,3,3,3,4,5,6,6,7,8,9,10,11]
p slices_without_repeats(a, 2)
# [[3, 6], [3, 9], [3, 7], [3, 2], [3, 6],
# [3, 10], [3, 11], [3, 4], [1, 8], [3, 5]]
Determines when this is not possible:
p slices_without_repeats(a, 3)
# An element repeats too much (ArgumentError)
And it handles the case where the last chunk is not full:
p slices_without_repeats([1, 2, 3, 3, 4, 4, 4], 3)
# [[4, 3, 2], [4, 3, 1], [4]]
source to share