Ruby compare two hash arrays, with specific keys
There are two hash arrays and I want to remove the "common" elements from the two arrays based on certain keys. For example:
array1 = [{a: '1', b:'2', c:'3'}, {a: '4', b: '5', c:'6'}]
array2 = [{a: '1', b:'2', c:'10'}, {a: '3', b: '5', c:'6'}]
and criteria keys a and b . So when I get the result of something like
array1-array2 (don't have to overwrite '-' if there better approach)
it expects to get [{a: '4', b: '5', c: '6'}] sine we used a and b as comparison criteria. It will destroy the second element since the value for a is different from array1.last and array2.last.
source to share
As I understand it, you are given two arrays of hashes and a set of keys. You want to reject all elements (hashes) of the first array whose values ββmatch the values ββof any element (hash) of the second array for all the specified keys. You can do it like this.
code
require 'set'
def reject_partial_dups(array1, array2, keys)
set2 = array2.each_with_object(Set.new) do |h,s|
s << h.values_at(*keys) if (keys-h.keys).empty?
end
array1.reject do |h|
(keys-h.keys).empty? && set2.include?(h.values_at(*keys))
end
end
Line:
(keys-h.keys).empty? && set2.include?(h.values_at(*keys))
can be simplified to:
set2.include?(h.values_at(*keys))
if none of the key values ββin the elements (hashes) array1
are equal nil
. I created a set (not an array) of array2
to speed up searching h.values_at(*keys)
this string.
Example
keys = [:a, :b]
array1 = [{a: '1', b:'2', c:'3'}, {a: '4', b: '5', c:'6'}, {a: 1, c: 4}]
array2 = [{a: '1', b:'2', c:'10'}, {a: '3', b: '5', c:'6'}]
reject_partial_dups(array1, array2, keys)
#=> [{:a=>"4", :b=>"5", :c=>"6"}, {:a=>1, :c=>4}]
Explanation
First create set2
e0 = array2.each_with_object(Set.new)
#=> #<Enumerator: [{:a=>"1", :b=>"2", :c=>"10"}, {:a=>"3", :b=>"5", :c=>"6"}]
# #:each_with_object(#<Set: {}>)>
Pass the first element e0
and calculate the block.
h,s = e0.next
#=> [{:a=>"1", :b=>"2", :c=>"10"}, #<Set: {}>]
h #=> {:a=>"1", :b=>"2", :c=>"10"}
s #=> #<Set: {}>
(keys-h.keys).empty?
#=> ([:a,:b]-[:a,:b,:c]).empty? => [].empty? => true
so calculate:
s << h.values_at(*keys)
#=> s << {:a=>"1", :b=>"2", :c=>"10"}.values_at(*[:a,:b] }
#=> s << ["1","2"] => #<Set: {["1", "2"]}>
Pass the second (last) element e0
to the block:
h,s = e0.next
#=> [{:a=>"3", :b=>"5", :c=>"6"}, #<Set: {["1", "2"]}>]
(keys-h.keys).empty?
#=> true
so calculate:
s << h.values_at(*keys)
#=> #<Set: {["1", "2"], ["3", "5"]}>
set2
#=> #<Set: {["1", "2"], ["3", "5"]}>
Reject items from array1
We now iterate through array1
, rejecting the items for which the block is evaluating true
.
e1 = array1.reject
#=> #<Enumerator: [{:a=>"1", :b=>"2", :c=>"3"},
# {:a=>"4", :b=>"5", :c=>"6"}, {:a=>1, :c=>4}]:reject>
The first element e1
is passed to the block:
h = e1.next
#=> {:a=>"1", :b=>"2", :c=>"3"}
a = (keys-h.keys).empty?
#=> ([:a,:b]-[:a,:b,:c]).empty? => true
b = set2.include?(h.values_at(*keys))
#=> set2.include?(["1","2"] => true
a && b
#=> true
so the first element is e1
rejected. Further:
h = e1.next
#=> {:a=>"4", :b=>"5", :c=>"6"}
a = (keys-h.keys).empty?
#=> true
b = set2.include?(h.values_at(*keys))
#=> set2.include?(["4","5"] => false
a && b
#=> false
therefore the second element is e1
not rejected. Finally:
h = e1.next
#=> {:a=>1, :c=>4}
a = (keys-h.keys).empty?
#=> ([:a,:c]-[:a,:b]).empty? => [:c].empty? => false
so return true (meaning the last item is e1
not rejected) since there is no need to compute:
b = set2.include?(h.values_at(*keys))
source to share
So, you really should try this yourself, because I will basically solve it for you.
General approach:
- For each time in array1
- Make sure the same value in array2 has any keys and values ββwith the same value
- If so, remove it
You will probably have something like array1.each_with_index { |h, i| h.delete_if {|k,v| array2[i].has_key?(k) && array2[i][k] == v } }
source to share