How can I slice the array, avoiding duplicate values in each chunk?

Question

How can I slice the array, avoiding duplicate values in each chunk?

Suppose I have this array:

a = [1,2,3,3,3,3,3,3,3,3,3,4,5,6,6,7,8,9,10,11]

a.each_slice (2) .to_a will generate pairs, but these pairs will contain non-unique values such as [3,3]. So I guess I am looking for some sort of unique_each_slice method.

I want to be able to shuffle this array until I get to the point where I have unique pairs of 2 (doesn't have to be 2, can be anything), for example (using 2 example)

[3, 1, 3, 7, 6, 3, 4, 5, 8, 3, 9, 3, 2, 3, 6, 3, 3, 11, 10, 3]

If you do each_slice (2) on this array, you get unique pairs:

[[3, 1], [3, 7], [6, 3], [4, 5], [8, 3], [9, 3], [2, 3], [6, 3], [3, 11], [10, 3]]

compared to the original where you have:

[[1, 2], [3, 3], [3, 3], [3, 3], [3, 3], [3, 4], [5, 6], [6, 7], [8, 9], [10, 11]]

with non-repeating pairs in each, like [3,3]

Another example, suppose I have:

a = [1,2,3,3,3,3,3,3,3,3,3,4,5,6,6,7,8,9,10,11,12,13,14,15,16,17]

Now, suppose there is some function a.unique_slices_of (3), I would get:

[[4, 16, 3], [1, 9, 3], [3, 6, 17], [3, 6, 10], [15, 3, 2], [3, 8, 12], [11, 3, 14], [7, 13, 3], [3, 5]]

By "unique slice" I mean a slice where the same number is not repeated twice: [1,2,3] is a unique slice, [3,1,3] is not.

So far, I have come up with the following method, which seems to take several iterations before everything turns out right:

class Array
  def unique_slices_of!(slices)
    loop do
      unique = true
      self.each_slice(slices) do |slice|
        if slice != slice.uniq
          self.shuffle!
          unique = false # so we know whether to loop again
          break
        end
      end
      break if unique # if unique didn't change, that means all slices were equal
      if unique == false then unique == true end # reset and start again
    end
    self 
  end
end

The main problem with my code is that a) I don't think I'm using some idiomatic Ruby method that can cut this process in half or more. b) Possibility of an infinite loop if the array simply cannot contain unique slices. I will probably need to use some combination theory here, but I'm not sure how.

+3

arrays ruby random-sample

daremkd 05 oct. 14 at 10:40

source to share

4 answers

Unique sample combinations

If you are looking for something more idiomatic and if algorithm efficiency is not your main concern, you can try the following:

a = [1, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 5, 6, 6, 7, 8, 9, 10, 11]
a.combination(2).reject { |pair| pair[0] == pair[1] }.sample(a.size / 2)

The main disadvantage of this approach is speed when a is large, because the combination of Array # will generate all possible combinations before you start down the results with Array # reject and Array # sample . However, for modest sized arrays, this certainly seems to be fast enough.

Evaluating Solution Performance

Routine testing shows that this is more than fast enough for modest-sized arrays. Consider:

require 'benchmark'

a = [1, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 5, 6, 6, 7, 8, 9, 10, 11]

Benchmark.measure do
  a.combination(2).reject { |pair| pair[0] == pair[1] }.sample(a.size / 2)
end.to_s
#=> "  0.000000   0.000000   0.000000 (  0.000052)\n"

Even with 100,000 iterations, my system still only had 3,650299 seconds. This seems fast enough for practical use given your placed enclosure, but your mileage may vary.

Allowing Arbitrary Subarray Sizing

Comparing terms with a graph

In the comments, the OP asked if this could be generalized to winnow sub-arrays with 2, 3, or 4 elements each. Yes, with a little refactoring, although performance degrades as the number of items in combination increases. Consider:

array = [1, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 5, 6, 6, 7, 8, 9, 10, 11]
element_size = 4 

array.combination(element_size).
  reject { |element| element.map { |member| element.count(member) > 1 }.any? }.
      sample(array.size / element_size)

The desired element_size is used to determine the number of samples that are required dynamically. This has the side effect of discarding any partially filled arrays, eliminating the "saggy" elements you get with # each_slice .

The workhorse here is still the rejection method, which now iterates over each member of each submatrix using # count and rejects the elements that have # any? terms that appear more than once in this submatrix. Even with better variable names, this is a little more complicated than when we have a fixed element size, but it is definitely more flexible.

More readable (and slightly faster) comparison

With a hat tip to @pguardiario (see this linked answer ), you can shorten this a little more and make it more readable by only selecting sub-arrays where all members of the array are # uniq . For example:

array.combination(element_size).
  select { |subarray| subarray == subarray.uniq }.
    sample(array.size / element_size)

+2

Todd A. Jacobs 05 oct. 14 at 18:24

source to share

I have a solution that seems to work. The main idea is to distribute the elements with the maximum number of samples to the maximum possible number of fragments. Add a few shuffle

to make them seem random.

class Array
  def unique_slices_of(slice_length)
    buf = []
    arr = []
    hash = Hash.new 0
    self.each {|i| hash[i] += 1}
    sorted = hash.sort_by {|k, v| v}.reverse
    # sorted[][0] holds the element and sorted[][1] holds the count
    return nil if sorted[0][1] > ((self.length * 1.0) / slice_length).ceil
    index = 0
    until sorted.length.zero?
      # Add element to buf and decrement count
      # if count == 0, remove the entry from sorted
      buf << sorted[index][0]
      sorted[index][1] -= 1
      if sorted[index][1] == 0
        sorted.delete_at index
        break if sorted.length == 0
        index -= 1
      end
      index = (index + 1) % sorted.length
      if buf.length == slice_length
        arr << buf.shuffle
        buf.clear
        index = 0
      end
    end
    arr << buf.shuffle if buf.length > 0
    arr.shuffle
  end
end

Output:

[3, 1, 3, 7, 6, 3, 4, 5, 8, 3, 9, 3, 2, 3, 6, 3, 3, 11, 10, 3].unique_slices_of(2)
#=> [[8, 3], [3, 6], [3, 5], [1, 10], [3, 4], [3, 6], [7, 3], [9, 3], [3, 11], [3, 2]]

[1,2,3,3,3,3,3,3,3,3,3,4,5,6,6,7,8,9,10,11,12,13,14,15,16,17].unique_slices_of(3)
#=> [[3, 9, 6], [6, 14, 3], [3, 2, 17], [7, 3, 16], [3, 11, 10], [15, 3, 8], [4, 3, 5], [3, 13, 12], [1, 3]]

0

rohit89 05 oct. 14 at 21:24

source to share

A non-random way to do it

The idea here is to put different values in bins. Then, if there are leftover bins:

Order bins by size, biggest baskets first
Make a snippet by taking a number for each of the first max_slice_size

boxes
Remove empty cells

Since each value within a slice is taken from a different bin, it is guaranteed that the slice will contain different values.

Code:

def slices_without_repeats(a, max_slice_size)
  slices = []
  bins = a.group_by { |e| e }.values
  until bins.empty?
    bins = bins.sort_by(&:size).reverse
    slice_size = [max_slice_size, bins.size].min
    slice = slice_size.times.map do |i|
      bins[i].pop
    end
    slices << slice
    bins.reject!(&:empty?)
    if slice.size < max_slice_size && !bins.empty?
      raise ArgumentError, "An element repeats too much"
    end
  end
  slices
end

This algorithm does not use sheer randomness. It uses Ruby's quicksort, which is unstable and can potentially exploit randomness (as when choosing pivot points).

Using:

a = [1,2,3,3,3,3,3,3,3,3,3,4,5,6,6,7,8,9,10,11]
p slices_without_repeats(a, 2)
# [[3, 6], [3, 9], [3, 7], [3, 2], [3, 6],
#  [3, 10], [3, 11], [3, 4], [1, 8], [3, 5]]

Determines when this is not possible:

p slices_without_repeats(a, 3)
# An element repeats too much (ArgumentError)

And it handles the case where the last chunk is not full:

p slices_without_repeats([1, 2, 3, 3, 4, 4, 4], 3)
# [[4, 3, 2], [4, 3, 1], [4]]

0

Wayne conrad 06 oct. 14 at 15:02

source to share

pguardiario · Accepted Answer · 2014-10-06T02:01:29+0000

a = [1,2,3,3,3,3,3,3,3,3,3,4,5,6,6,7,8,9,10,11]

You can check if slices are "unique":

a.each_slice(2).all?{|x| x == x.uniq}

So now you just shuffle until you get what you want:

a.shuffle! until a.each_slice(2).all?{|x| x == x.uniq}

The easiest way to avoid an infinite loop is timeout

:

require 'timeout'
# raise an error if it takes more than 1 second
timeout(1){ a.shuffle! until a.each_slice(3).all?{|x| x == x.uniq} }

How can I slice the array, avoiding duplicate values ​​in each chunk?

Unique sample combinations

Evaluating Solution Performance

Allowing Arbitrary Subarray Sizing

Comparing terms with a graph

More readable (and slightly faster) comparison

A non-random way to do it

More articles:

How can I slice the array, avoiding duplicate values in each chunk?