Intersection of all combinations of n sets

I need help finding an efficient algorithm to solve this problem:

Given n unsorted sets of integers, find all possible combinations of n and their intersections.

For example:

Input (n = 3):

Set 1 = 1, 10, 6, 11, 14, 3
Set 2 = 3, 7, 11, 9, 5
Set 3 = 11, 6, 9, 1, 4

Output:

Set 1 and 2: 3, 11
Set 1 and 3: 1, 6
Set 2 and 3: 9, 11
Set 1, 2 and 3: 11

I thought to first find all possible combinations of sets and then use an algorithm to find the intersection of the n sets found here . However, I am concerned about the temporal effectiveness of this approach.

If you can find something better than my naive approach, the pseudocode answer will be most helpful.

+3


source to share


2 answers


Here is a solution inspired by http://research.google.com/archive/mapreduce.html (so it can be written in a distributed fashion if you want).

Match all of your items in sets with pairs [element, set]

. Group this list by item. (You can just sort and pop the elements. Or you can create a hash of arrays whose keys are the elements and the values ​​are the list of sets that the element is in.) Then, for each element, emit a list of [combination of sets, element]. Group it up by combining. And now, for each non-empty combination, you can simply read the elements in it.

If you want to distribute computations with actual map reduction, the first map will map to the item key and set value. The first shortcut will simply store the list of sets it is in, by item. The second map will allocate for each element one key for each combination of sets in which it is located, and this element is a value. And the second cut will save your final answer.

This will help you go through your example in detail. You start with:

Set 1 = 1, 10, 6, 11, 14, 3
Set 2 = 3, 7, 11, 9, 5
Set 3 = 11, 6, 9, 1, 4

      

List this:

[1, Set 1], [10, Set 1], [6, Set 1], [11, Set 1], [14, Set 1], [3, Set 1],
[3, Set 2], [7, Set 2], [11, Set 2], [9, Set 2], [5, Set 2],
[11, Set 3], [6, Set 3], [9, Set 3], [1, Set 3], [4, Set 3],

      



Now by element (I did it manually by sorting) to get:

1: Set 1, Set 3
3: Set 1, Set 2
4: Set 3
5: Set 2
6: Set 1, Set 3
7: Set 2
9: Set 2, Set 3
10: Set 1
11: Set 1, Set 2, Set 3
14: Set 1

      

And now we do a second mapping (skipping elements that are only in one set) to get:

[(Set 1, Set 3), 1],
[(Set 1, Set 2), 3],
[(Set 1, Set 3), 6],
[(Set 2, Set 3), 9],
[(Set 1, Set 2), 11], [(Set 1, Set 3), 11], [(Set 2, Set 3), 11], [(Set 1, Set 2, Set 3), 11]

      

We group that by a combination of sets and we get:

(Set 1, Set 2): 3, 11
(Set 1, Set 3): 1, 6, 11
(Set 2, Set 3): 9, 11
(Set 1, Set 2, Set 3): 11

      

(This differs from your suggested answer because you missed that 11 at the intersection of sets 1 and 3.)

+11


source


Using your example, just note that

1 n 2 n 3

also



(1 n 2) n 3

So, if you cache your (1 n 2) solution, you can reuse it, so it only takes one additional intersection calculation to compute 1 n 2 n 3. Typically, if there are four possible sets of combinations in your example, then only four intersections will ultimately need to be performed if you save and reuse the previous results.

+1


source







All Articles