Substituting redundant subsets

I gave a set of sets of tuples with two elements (they are ordered by the second element):

{
{(1,"c1"), (1,"c2"), (1,"c3"), (1,"c5")}
{(1,"c1"), (2,"c3"), (1,"c5")},
{(1,"c1"), (1,"c3"), (1,"c5")},
}

      

The task replaces redundant subsets of items. Given the number k> = 1. Replace all subsets of at least k elements that appear more than once in the pattern set. Substitution sequences should be as large as possible. Abbreviated form:

{
{z, (1,"c2")}
{(1,"c1"), (2,"c3"), (1,"c5")},
{z},
}
z := {(1,"c1"), (1,"c3"), (1,"c5")}

      

My native approach would be, start with set and then, for each set, calculate how many matches they have. Then choose the one that has the most matches and replace. Then restart this process until another set has more than k matches. Then move on to the next set and do the same, where you can neglect all the previous sets.

Elements are equal if both values ​​are the same. The second value is a string, but most likely it could be a performance indicator to replace them with the first number. The first is a floating point number.

It looks a lot like data compression. Is there a better algorithm for calculating this? Is there a good data structure for this purpose?

+3


source to share


2 answers


If the number of variables in your linear equations is less than the number of equations, you can more quickly determine the largest substitution using this alternative method:

Problem Statement

We made it clear in the comments that we z := (1,"c3")

can partially substitute {2,"c3"}

as {z,(1,"c3")

Given this, the fact that data points are tuples is irrelevant to this question, which allows me to treat your type as:

  {c1, c2, c3, c5}
  {c1, c3, c5}
  {c1, c3, c5}

      

The challenge is to find the largest set {ca, cb ..} that is a subset of every 2 input sets.

Input sets can be expressed as

     | c1 | c2 | c3 | c5 |
+----|----|----|----|----|
| S1 | 1  | 1  | 1  | 1  |
| S2 | 1  | 0  | 1  | 1  |
| S3 | 1  | 0  | 1  | 1  |

      

Decision

Solution A will have the form



     | c1 | c2 | c3 | c5 | subset of |
+----|----|----|----|----|-----------|
| A  | 1  | 1  | 1  | 1  | k         |

      

Where k is the number of input sets of A, belongs to and must be greater than 2. Now, for n values ​​of cx, A has two possibilities. Evaluating if the candidate decision Ax is a subset of the set Sx includes bitwise ANDing the bit field Ax with the bit field Sx, and then ANDing the individual bits of the result.

By iterating through the possible values ​​for Ax, starting with most of the bits, the first value of A with k> 1 is the desired solution.

| s  | c1 | c2 | c3 | c5 | subset of |
|----|----|----|----|----|-----------|
| A1 | 1  | 1  | 1  | 1  | 1         |
| A2 | 1  | 1  | 1  | 0  | 0         |
| A3 | 1  | 1  | 0  | 1  | 0         |
| A4 | 1  | 0  | 1  | 1  | 3 <-bingo!|
| A5 | 0  | 1  | 1  | 1  |           |
| A6 | 1  | 1  | 0  | 0  |           |
| A7 | 0  | 1  | 1  | 0  |           |
| A  ...                             |

      

Complexity

The time complexity of this solution is still exponential. However, it depends exponentially on the number of cx values. In contrast to the naive solution, which depends exponentially on the number of input sets. If the numeric variables (cx) are less than the number of input sets (Sx), this method will be faster.

Note. If the assumption at the top of this solution about (2,"c3")

is wrong, you just need to treat (2,"c3")

cx as a separate value. The entrance will be:

{c1, c2, c3, c5}
{c1, c6, c5}
{c1, c3, c5}

      

and the rest of this answer remains valid.

0


source


You can use maps

for this purpose

consider set of sets of tuples

how set of maps

, now the problem will be whether it is a 'z' map

subset each map

, if a subset, we need to output map 'z'

as z and the rest of the display elements in the form pairs

, if not, we just need to outputmap

Now, to find if this is a map is subset of another map

, we use the function STL

:

Includes



Includes tests whether one sorted range includes another sorted range. That is, it returns true if and only if, for each element from [first2, last2], the equivalent element [1] is also present in [first1, last1] [2]. Both [first1, last1) and [first2, last2) must be sorted in ascending order.

The two versions of include differ in how they determine whether one element is smaller than the other. The first version compares objects using the <operator and the second compares objects using the comp function object.

here is the program that prints if map is subset

, change it with ur requirements and you can try ur own ( as it is easy one

) output program :

#include <algorithm>
#include <iostream>
#include <map>

int main()
{
    std::map<int, std::string> a,b;


    a[0] = "0";
    a[1] = "1";
    b[0] = "0";

    std::cout << "b ⊆ a? " << std::includes(a.begin(), a.end(), b.begin(), b.end()) << " (will be 1)\n";

    b[1] = "1";
    std::cout << "b ⊆ a? " << std::includes(a.begin(), a.end(), b.begin(), b.end()) << " (will be 1)\n";

    b[2] = "2";
    std::cout << "b ⊆ a? " << std::includes(a.begin(), a.end(), b.begin(), b.end()) << " (will be 0)\n";
}

      

0


source







All Articles