Ruby - Show Deltas Between two arrays of hashes based on a subset of hash keys
I'm trying to compare two arrays of hashes with a very similar hash structure (identical and always present keys) and return the delta between them - in particular, I would like to capture the following:
- Hashes a part
array1
that does not exist inarray2
- Hashes a part
array2
that does not exist inarray1
- Hashes that appear in both datasets
This can usually be achieved by simply doing the following:
deltas_old_new = (array1-array2)
deltas_new_old = (array2-array1)
The problem for me (which turned into a 2-3 hour struggle!) Is that I need to determine the delta based on the values ββof 3 keys in the hash ("id", "ref", 'name') - the values ββof these 3 The -x keys actually make up what makes up the unique entry in my data - but I have to keep other hash key / value pairs (for example, 'extra'
and many other key / value pairs not shown for brevity.
Sample data:
array1 = [{'id' => '1', 'ref' => '1001', 'name' => 'CA', 'extra' => 'Not Sorted On 5'},
{'id' => '2', 'ref' => '1002', 'name' => 'NY', 'extra' => 'Not Sorted On 7'},
{'id' => '3', 'ref' => '1003', 'name' => 'WA', 'extra' => 'Not Sorted On 9'},
{'id' => '7', 'ref' => '1007', 'name' => 'OR', 'extra' => 'Not Sorted On 11'}]
array2 = [{'id' => '1', 'ref' => '1001', 'name' => 'CA', 'extra' => 'Not Sorted On 5'},
{'id' => '3', 'ref' => '1003', 'name' => 'WA', 'extra' => 'Not Sorted On 9'},
{'id' => '8', 'ref' => '1002', 'name' => 'NY', 'extra' => 'Not Sorted On 7'},
{'id' => '5', 'ref' => '1005', 'name' => 'MT', 'extra' => 'Not Sorted On 10'},
{'id' => '12', 'ref' => '1012', 'name' => 'TX', 'extra' => 'Not Sorted On 85'}]
Expected result (3 separate arrays of hashes):
An object containing data in array1
, but not in array2
-
[{'id' => '2', 'ref' => '1002', 'name' => 'NY', 'extra' => 'Not Sorted On 7'},
{'id' => '7', 'ref' => '1007', 'name' => 'OR', 'extra' => 'Not Sorted On 11'}]
An object containing data in array2
, but not in array1
-
[{'id' => '8', 'ref' => '1002', 'name' => 'NY', 'extra' => 'Not Sorted On 7'},
{'id' => '5', 'ref' => '1005', 'name' => 'MT', 'extra' => 'Not Sorted On 10'},
{'id' => '12', 'ref' => '1012', 'name' => 'TX', 'extra' => 'Not Sorted On 85'}]
Object containing data in BOTH array1
and array2
-
[{'id' => '1', 'ref' => '1001', 'name' => 'CA', 'extra' => 'Not Sorted On 5'},
{'id' => '3', 'ref' => '1003', 'name' => 'WA', 'extra' => 'Not Sorted On 9'}]
I have tried many attempts at comparing iteration over arrays and using Hash#keep_if
based on 3 keys, and also merging both datasets into one array, and then trying deduplication based on array1
, but I keep going empty handed. Thank you in advance for your time and help!
source to share
It's not pretty, but it works. It creates a third array of all the unique values array1
and array2
and iterate through it.
Then, since it include?
doesn't allow for a custom match, we can fake it by detect
and search for an element in the array that has a custom sub-hash match. We'll wrap this in a custom method so we can just call it passing in array1
or array2
instead of writing twice.
Finally, we loop through ours array3
and determine if it came item
from array1
, array2
or both and add to the appropriate output array.
array1 = [{'id' => '1', 'ref' => '1001', 'name' => 'CA', 'extra' => 'Not Sorted On 5'},
{'id' => '2', 'ref' => '1002', 'name' => 'NY', 'extra' => 'Not Sorted On 7'},
{'id' => '3', 'ref' => '1003', 'name' => 'WA', 'extra' => 'Not Sorted On 9'},
{'id' => '7', 'ref' => '1007', 'name' => 'OR', 'extra' => 'Not Sorted On 11'}]
array2 = [{'id' => '1', 'ref' => '1001', 'name' => 'CA', 'extra' => 'Not Sorted On 5'},
{'id' => '3', 'ref' => '1003', 'name' => 'WA', 'extra' => 'Not Sorted On 9'},
{'id' => '8', 'ref' => '1002', 'name' => 'NY', 'extra' => 'Not Sorted On 7'},
{'id' => '5', 'ref' => '1005', 'name' => 'MT', 'extra' => 'Not Sorted On 10'},
{'id' => '12', 'ref' => '1012', 'name' => 'TX', 'extra' => 'Not Sorted On 85'}]
# combine the arrays into 1 array that contains items in both array1 and array2 to loop through
array3 = (array1 + array2).uniq { |item| { 'id' => item['id'], 'ref' => item['ref'], 'name' => item['name'] } }
# Array#include? doesn't allow a custom matcher, so we can fake it by using Array#detect
def is_included_in(array, object)
object_identifier = { 'id' => object['id'], 'ref' => object['ref'], 'name' => object['name'] }
array.detect do |item|
{ 'id' => item['id'], 'ref' => item['ref'], 'name' => item['name'] } == object_identifier
end
end
# output array initializing
array1_only = []
array2_only = []
array1_and_array2 = []
# loop through all items in both array1 and array2 and check if it was in array1 or array2
# if it was in both, add to array1_and_array2, otherwise, add it to the output array that
# corresponds to the input array
array3.each do |item|
in_array1 = is_included_in(array1, item)
in_array2 = is_included_in(array2, item)
if in_array1 && in_array2
array1_and_array2.push item
elsif in_array1
array1_only.push item
else
array2_only.push item
end
end
puts array1_only.inspect # => [{"id"=>"2", "ref"=>"1002", "name"=>"NY", "extra"=>"Not Sorted On 7"}, {"id"=>"7", "ref"=>"1007", "name"=>"OR", "extra"=>"Not Sorted On 11"}]
puts array2_only.inspect # => [{"id"=>"8", "ref"=>"1002", "name"=>"NY", "extra"=>"Not Sorted On 7"}, {"id"=>"5", "ref"=>"1005", "name"=>"MT", "extra"=>"Not Sorted On 10"}, {"id"=>"12", "ref"=>"1012", "name"=>"TX", "extra"=>"Not Sorted On 85"}]
puts array1_and_array2.inspect # => [{"id"=>"1", "ref"=>"1001", "name"=>"CA", "extra"=>"Not Sorted On 5"}, {"id"=>"3", "ref"=>"1003", "name"=>"WA", "extra"=>"Not Sorted On 9"}]
source to share
For this type of problem, it is usually easier to work with indexes.
code
def keepers(array1, array2, keys)
a1 = make_hash(array1, keys)
a2 = make_hash(array2, keys)
common_keys_of_a1_and_a2 = a1.keys & a2.keys
[keeper_idx(array1, a1, common_keys_of_a1_and_a2),
keeper_idx(array2, a2, common_keys_of_a1_and_a2)]
end
def make_hash(arr, keys)
arr.each_with_index.with_object({}) do |(g,i),h|
(h[g.values_at(*keys)] ||= []) << i
end
end
def keeper_idx(arr, a, common_keys_of_a1_and_a2)
arr.size.times.to_a - a.values_at(*common_keys_of_a1_and_a2).flatten
end
Example
array1 =
[{'id' => '1', 'ref' => '1001', 'name' => 'CA', 'extra' => 'Not Sorted On 5'},
{'id' => '2', 'ref' => '1002', 'name' => 'NY', 'extra' => 'Not Sorted On 7'},
{'id' => '3', 'ref' => '1003', 'name' => 'WA', 'extra' => 'Not Sorted On 9'},
{'id' => '3', 'ref' => '1003', 'name' => 'WA', 'extra' => 'Not Sorted On 8'},
{'id' => '7', 'ref' => '1007', 'name' => 'OR', 'extra' => 'Not Sorted On 11'}]
array2 =
[{'id' => '1', 'ref' => '1001', 'name' => 'CA', 'extra' => 'Not Sorted On 5'},
{'id' => '3', 'ref' => '1003', 'name' => 'WA', 'extra' => 'Not Sorted On 9'},
{'id' => '8', 'ref' => '1002', 'name' => 'NY', 'extra' => 'Not Sorted On 7'},
{'id' => '5', 'ref' => '1005', 'name' => 'MT', 'extra' => 'Not Sorted On 10'},
{'id' => '5', 'ref' => '1005', 'name' => 'MT', 'extra' => 'Not Sorted On 12'},
{'id' => '12', 'ref' => '1012', 'name' => 'TX', 'extra' => 'Not Sorted On 85'}]
Please note that the two arrays are slightly different from those in the question. The question did not indicate if each array two hashes have the same values ββfor the specified keys. So I added a hash for each array to show if the case is.
keys = ['id', 'ref', 'name']
idx1, idx2 = keepers(array1, array2, keys)
#=> [[1, 4], [2, 3, 4, 5]]
idx1
( idx2
) are the indices of the elements array1
( array2
) that remain after the matches are removed. ( array1
and array2
do not change.)
It follows that the two arrays are mapped to
array1.values_at(*idx1)
#=> [{"id"=> "2", "ref"=>"1002", "name"=>"NY", "extra"=>"Not Sorted On 7"},
# {"id"=> "7", "ref"=>"1007", "name"=>"OR", "extra"=>"Not Sorted On 11"}]
and
array2.values_at(*idx2)
#=> [{"id"=> "8", "ref"=>"1002", "name"=>"NY", "extra"=>"Not Sorted On 7"},
# {"id"=> "5", "ref"=>"1005", "name"=>"MT", "extra"=>"Not Sorted On 10"},
# {"id"=> "5", "ref"=>"1005", "name"=>"MT", "extra"=>"Not Sorted On 12"},
# {"id"=>"12", "ref"=>"1012", "name"=>"TX", "extra"=>"Not Sorted On 85"}]
The hash values ββto be removed are set as follows.
array1.size.times.to_a - idx1
#=> [0, 2, 3]
array2.size.times.to_a - idx2
#[0, 1]
Explanation
Following are the steps.
a1 = make_hash(array1, keys)
#=> {["1", "1001", "CA"]=>[0], ["2", "1002", "NY"]=>[1],
# ["3", "1003", "WA"]=>[2, 3], ["7", "1007", "OR"]=>[4]}
a2 = make_hash(array2, keys)
#=> {["1", "1001", "CA"]=>[0], ["3", "1003", "WA"]=>[1],
# ["8", "1002", "NY"]=>[2], ["5", "1005", "MT"]=>[3, 4],
# ["12", "1012", "TX"]=>[5]}
common_keys_of_a1_and_a2 = a1.keys & a2.keys
#=> [["1", "1001", "CA"], ["3", "1003", "WA"]]
keeper_idx(array1, a1, common_keys_of_a1_and_a2)
#=> [1, 4] (for array1)
keeper_idx(array2, a2, common_keys_of_a1_and_a2)
#=> [2, 3, 4, 5]Β· (for array2)
source to share
array1 - array2 #data in array1 but not in array2
array2 - array1 #data in array2 but not in array1
array1 & array2 #data in both array1 and array2
Since you tagged this question set you can use kits in a similar way:
require 'set'
set1 = array1.to_set
set2 = array2.to_set
set1 - set2
set2 - set1
set1 & set2
source to share