How can I slice the array, avoiding duplicate values ​​in each chunk?

Suppose I have this array:

a = [1,2,3,3,3,3,3,3,3,3,3,4,5,6,6,7,8,9,10,11]

      

a.each_slice (2) .to_a will generate pairs, but these pairs will contain non-unique values ​​such as [3,3]. So I guess I am looking for some sort of unique_each_slice method.

I want to be able to shuffle this array until I get to the point where I have unique pairs of 2 (doesn't have to be 2, can be anything), for example (using 2 example)

[3, 1, 3, 7, 6, 3, 4, 5, 8, 3, 9, 3, 2, 3, 6, 3, 3, 11, 10, 3]

      

If you do each_slice (2) on this array, you get unique pairs:

[[3, 1], [3, 7], [6, 3], [4, 5], [8, 3], [9, 3], [2, 3], [6, 3], [3, 11], [10, 3]]

      

compared to the original where you have:

[[1, 2], [3, 3], [3, 3], [3, 3], [3, 3], [3, 4], [5, 6], [6, 7], [8, 9], [10, 11]]

      

with non-repeating pairs in each, like [3,3]

Another example, suppose I have:

a = [1,2,3,3,3,3,3,3,3,3,3,4,5,6,6,7,8,9,10,11,12,13,14,15,16,17]

      

Now, suppose there is some function a.unique_slices_of (3), I would get:

[[4, 16, 3], [1, 9, 3], [3, 6, 17], [3, 6, 10], [15, 3, 2], [3, 8, 12], [11, 3, 14], [7, 13, 3], [3, 5]]

      

By "unique slice" I mean a slice where the same number is not repeated twice: [1,2,3] is a unique slice, [3,1,3] is not.

So far, I have come up with the following method, which seems to take several iterations before everything turns out right:

class Array
  def unique_slices_of!(slices)
    loop do
      unique = true
      self.each_slice(slices) do |slice|
        if slice != slice.uniq
          self.shuffle!
          unique = false # so we know whether to loop again
          break
        end
      end
      break if unique # if unique didn't change, that means all slices were equal
      if unique == false then unique == true end # reset and start again
    end
    self 
  end
end

      

The main problem with my code is that a) I don't think I'm using some idiomatic Ruby method that can cut this process in half or more. b) Possibility of an infinite loop if the array simply cannot contain unique slices. I will probably need to use some combination theory here, but I'm not sure how.

+3


source to share


4 answers


a = [1,2,3,3,3,3,3,3,3,3,3,4,5,6,6,7,8,9,10,11]

      

You can check if slices are "unique":

a.each_slice(2).all?{|x| x == x.uniq}

      

So now you just shuffle until you get what you want:



a.shuffle! until a.each_slice(2).all?{|x| x == x.uniq}

      

The easiest way to avoid an infinite loop is timeout

:

require 'timeout'
# raise an error if it takes more than 1 second
timeout(1){ a.shuffle! until a.each_slice(3).all?{|x| x == x.uniq} }

      

+2


source


Unique sample combinations

If you are looking for something more idiomatic and if algorithm efficiency is not your main concern, you can try the following:

a = [1, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 5, 6, 6, 7, 8, 9, 10, 11]
a.combination(2).reject { |pair| pair[0] == pair[1] }.sample(a.size / 2)

      

The main disadvantage of this approach is speed when a is large, because the combination of Array # will generate all possible combinations before you start down the results with Array # reject and Array # sample . However, for modest sized arrays, this certainly seems to be fast enough.

Evaluating Solution Performance

Routine testing shows that this is more than fast enough for modest-sized arrays. Consider:

require 'benchmark'

a = [1, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 5, 6, 6, 7, 8, 9, 10, 11]

Benchmark.measure do
  a.combination(2).reject { |pair| pair[0] == pair[1] }.sample(a.size / 2)
end.to_s
#=> "  0.000000   0.000000   0.000000 (  0.000052)\n"

      

Even with 100,000 iterations, my system still only had 3,650299 seconds. This seems fast enough for practical use given your placed enclosure, but your mileage may vary.

Allowing Arbitrary Subarray Sizing



Comparing terms with a graph

In the comments, the OP asked if this could be generalized to winnow sub-arrays with 2, 3, or 4 elements each. Yes, with a little refactoring, although performance degrades as the number of items in combination increases. Consider:

array = [1, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 5, 6, 6, 7, 8, 9, 10, 11]
element_size = 4 

array.combination(element_size).
  reject { |element| element.map { |member| element.count(member) > 1 }.any? }.
      sample(array.size / element_size)

      

The desired element_size is used to determine the number of samples that are required dynamically. This has the side effect of discarding any partially filled arrays, eliminating the "saggy" elements you get with # each_slice .

The workhorse here is still the rejection method, which now iterates over each member of each submatrix using # count and rejects the elements that have # any? terms that appear more than once in this submatrix. Even with better variable names, this is a little more complicated than when we have a fixed element size, but it is definitely more flexible.

More readable (and slightly faster) comparison

With a hat tip to @pguardiario (see this linked answer ), you can shorten this a little more and make it more readable by only selecting sub-arrays where all members of the array are # uniq . For example:

array.combination(element_size).
  select { |subarray| subarray == subarray.uniq }.
    sample(array.size / element_size)

      

+2


source


I have a solution that seems to work. The main idea is to distribute the elements with the maximum number of samples to the maximum possible number of fragments. Add a few shuffle

to make them seem random.

class Array
  def unique_slices_of(slice_length)
    buf = []
    arr = []
    hash = Hash.new 0
    self.each {|i| hash[i] += 1}
    sorted = hash.sort_by {|k, v| v}.reverse
    # sorted[][0] holds the element and sorted[][1] holds the count
    return nil if sorted[0][1] > ((self.length * 1.0) / slice_length).ceil
    index = 0
    until sorted.length.zero?
      # Add element to buf and decrement count
      # if count == 0, remove the entry from sorted
      buf << sorted[index][0]
      sorted[index][1] -= 1
      if sorted[index][1] == 0
        sorted.delete_at index
        break if sorted.length == 0
        index -= 1
      end
      index = (index + 1) % sorted.length
      if buf.length == slice_length
        arr << buf.shuffle
        buf.clear
        index = 0
      end
    end
    arr << buf.shuffle if buf.length > 0
    arr.shuffle
  end
end

      

Output:

[3, 1, 3, 7, 6, 3, 4, 5, 8, 3, 9, 3, 2, 3, 6, 3, 3, 11, 10, 3].unique_slices_of(2)
#=> [[8, 3], [3, 6], [3, 5], [1, 10], [3, 4], [3, 6], [7, 3], [9, 3], [3, 11], [3, 2]]

[1,2,3,3,3,3,3,3,3,3,3,4,5,6,6,7,8,9,10,11,12,13,14,15,16,17].unique_slices_of(3)
#=> [[3, 9, 6], [6, 14, 3], [3, 2, 17], [7, 3, 16], [3, 11, 10], [15, 3, 8], [4, 3, 5], [3, 13, 12], [1, 3]]

      

0


source


A non-random way to do it

The idea here is to put different values ​​in bins. Then, if there are leftover bins:

  • Order bins by size, biggest baskets first
  • Make a snippet by taking a number for each of the first max_slice_size

    boxes
  • Remove empty cells

Since each value within a slice is taken from a different bin, it is guaranteed that the slice will contain different values.

Code:

def slices_without_repeats(a, max_slice_size)
  slices = []
  bins = a.group_by { |e| e }.values
  until bins.empty?
    bins = bins.sort_by(&:size).reverse
    slice_size = [max_slice_size, bins.size].min
    slice = slice_size.times.map do |i|
      bins[i].pop
    end
    slices << slice
    bins.reject!(&:empty?)
    if slice.size < max_slice_size && !bins.empty?
      raise ArgumentError, "An element repeats too much"
    end
  end
  slices
end

      

This algorithm does not use sheer randomness. It uses Ruby's quicksort, which is unstable and can potentially exploit randomness (as when choosing pivot points).

Using:

a = [1,2,3,3,3,3,3,3,3,3,3,4,5,6,6,7,8,9,10,11]
p slices_without_repeats(a, 2)
# [[3, 6], [3, 9], [3, 7], [3, 2], [3, 6],
#  [3, 10], [3, 11], [3, 4], [1, 8], [3, 5]]

      

Determines when this is not possible:

p slices_without_repeats(a, 3)
# An element repeats too much (ArgumentError)

      

And it handles the case where the last chunk is not full:

p slices_without_repeats([1, 2, 3, 3, 4, 4, 4], 3)
# [[4, 3, 2], [4, 3, 1], [4]]

      

0


source







All Articles