Install combinatorial algorithm in Java

I have a dataset with attributes like this:

Marital_status = {M,S,W,D}
IsBlind = {Y,N}
IsDisabled = {Y,N}
IsVetaran = {Y,N}

      

etc. There are about 200 such variables.

I need an algorithm to generate combinations of attributes with one value at a time.

In other words, my first combination would be:

Marital_status = M, IsBlind = Y, IsDisabled = Y, IsVeteran = Y

      

Next set:

Marital_status = M, IsBlind = Y, IsDisabled = Y, IsVeteran = N

      

I tried to use a simple combination generator, treating each value for each attribute as the attribute itself. It didn't work because mutually exclusive options are included in combinations and the number of possible combinations was really huge (133873417996074857185490633899939406700260683726864088366400 to be precise)

Could you please suggest an algorithm (preferably coded in Java)?

Thank!!

+2


source to share


3 answers


As others (and yourself) have pointed out, it is impossible to fully verify this.

I suggest you take a sampling approach and test it. You have a strong theoretical foundation, so you should be able to find your way around the internet to find and understand this.


But let me give you a small example. For now, I will ignore the possible "clusters" of parameters (which are highly related).

  • Create a sample data containing all possible values ​​for all of your 200 parameters. This completeness ensures that the parameter value cannot be forgotten.

    It doesn't have to be created beforehand, values ​​can be created using a loop.

  • Additional values ​​must be added to each sample data. A simple approach would be to select multiple times when you want to test every single sample (say N = 100). For each sample of one of the data, you randomly generate other values ​​N times .



If there are 1000 possible values ​​using all 200 parameters and N = 100, this will give us 100k tests.


You can elaborate on this basic idea:

  • If you want your test to be repeatable , you can create it only once, save it, and then reuse the same sets in all future tests.
  • You can control the distribution so that each value is fetched many times .
  • In real life, all 200 parameters will have no connections. Many of the parameters will indeed be related to some others, since the chances of values ​​being found together are slim. Instead of making the original exhaustive set on just one parameter, as I did earlier,
    I would perform an exhaustive set in a cluster of related parameters .
+5


source


Find another way. If you have 200 variables and each has at least 2 options, you will have a combination> = 2 ^ 200. If you create one combination for every nanosecond, it will take about 10 ^ 43 years to list 2 ^ 200 options.



+5


source


As Keith pointed out , the number of combinations would be impossibly large if there were no excluded combinations, which would make your need impossible. However, since you have already said that you have mutually exclusive choices, the decision space will be smaller.

How much less? Depending on how many options are mutually exclusive. I recommend doing some math before going too hard.

Assuming enough choice is exclusive, you still have to essentially coerce it, but you are unlikely to find an existing useful algorithm.

Which brings me to the question: what is your reason for doing this - exhaustive testing? Sounds good, but you may find it impossible. I ran into this problem myself, and at the end of the day, you may well be prompted by some combination of carefully selected "edge" cases, as well as some quasi-randomly selected other cases.

After reading your comment above, it seems you are defining "mutual exclusion" differently than I am, and I'm afraid you might have a problem.

Thus, this patient is neither blind nor blind. Fine. But that's not what I (and I suspect everyone here is) figured out when you mentioned mutual exceptions.

To these, I say, for example, if the blind cannot be disabled or something like that.

Without a significant number of mutually exclusive relationships between your attributes that constrain their combinations, you cannot complete exhaustive testing.

+3


source







All Articles