Algorithm for generating each combination of n elements divided into k sets

I want to create each distinct combination of splitting lowercase into two sets of six letters and two sets of seven letters. The order of the letters within the sets does not matter, i.e. If two solutions differ only in the order of letters within the sets, then these solutions are identical.

those. these two solutions are identical:

[a, b, c, d, e, f] [g, h, i, j, k, l] [m, n, o, p, q, r, s] [t, u, v, w, x, y, z]
[f, b, c, d, e, a] [l, h, i, j, k, g] [s, n, o, p, q, r, m] [z, u, v, w, x, y, t]

A naive approach might be to generate each 26 letter permutation plus 2 dummies, distribute them evenly across the four groups, and discard duplication decisions (and ignore dummies when I use the data). But that seems pretty ineffective. I'm sure there is a known algorithm out there, but I'm struggling to find this given the wide range of similar but different permutation / combination problems.

Is there an existing named algorithm that can split nk-elements into n sets of k elements, generating each combination of these sets? If not, I'll have to hack something myself. But this looks like a problem that has already been resolved.

+3


source to share


2 answers


I don't know of any algorithm name for this (although it probably exists), but the approach I mentioned in the comments avoids duplication and is as efficient as I guess you can get.

It seems like you could improve the situation by putting the problem on your head: every letter has to go into one of the four buckets, and the buckets have limited space, so recursively try to put each letter in every bucket that has room for it. This way you create combinations, not permutations.

Here's a C # implementation. It can generate 10,000,000 combinations in less than 30 seconds, and 2/3 of this time is spent only building the string outputs:

void Main()
{
    // Tweak these starting values to create smaller subsets if you want.
    var letters = Enumerable.Range(0, 26).Select(i => (char)('a' + i)).ToList();
    var buckets = new[]{new Bucket(6), new Bucket(6), new Bucket(7), new Bucket(7)};
    // I'm only taking 100 values because otherwise this would take a really long time.
    var combos = Combos(letters, 0, buckets).Take(100);
    foreach (var combo in combos)
    {
        Console.WriteLine(combo);
    }
}

public class Bucket : List<char>
{
    public int MaxLoad {get; private set;}
    public Bucket(int capacity) : base(capacity)
    {
        MaxLoad = capacity;
    }
}

// Define other methods and classes here
IEnumerable<string> Combos(IList<char> letters, int currentIndex, Bucket[] buckets)
{
    if(currentIndex == letters.Count){
        yield return string.Join("|", buckets.Select(b => string.Join(",", b)));
        yield break;
    }
    var currentLetter = letters[currentIndex];
    foreach (var bucket in buckets)
    {
        if(bucket.Count < bucket.Capacity)
        {
            bucket.Add(currentLetter);
            foreach (var possibility in Combos(letters, currentIndex + 1, buckets))
            {
                yield return possibility;
            }
            bucket.Remove(currentLetter);
        }
    }
}

      



Output example:

a,b,c,d,e,f|g,h,i,j,k,l|m,n,o,p,q,r,s|t,u,v,w,x,y,z
a,b,c,d,e,f|g,h,i,j,k,l|m,n,o,p,q,r,t|s,u,v,w,x,y,z
a,b,c,d,e,f|g,h,i,j,k,l|m,n,o,p,q,r,u|s,t,v,w,x,y,z
a,b,c,d,e,f|g,h,i,j,k,l|m,n,o,p,q,r,v|s,t,u,w,x,y,z
a,b,c,d,e,f|g,h,i,j,k,l|m,n,o,p,q,r,w|s,t,u,v,x,y,z
a,b,c,d,e,f|g,h,i,j,k,l|m,n,o,p,q,r,x|s,t,u,v,w,y,z
a,b,c,d,e,f|g,h,i,j,k,l|m,n,o,p,q,r,y|s,t,u,v,w,x,z
a,b,c,d,e,f|g,h,i,j,k,l|m,n,o,p,q,r,z|s,t,u,v,w,x,y
a,b,c,d,e,f|g,h,i,j,k,l|m,n,o,p,q,s,t|r,u,v,w,x,y,z
a,b,c,d,e,f|g,h,i,j,k,l|m,n,o,p,q,s,u|r,t,v,w,x,y,z
a,b,c,d,e,f|g,h,i,j,k,l|m,n,o,p,q,s,v|r,t,u,w,x,y,z
a,b,c,d,e,f|g,h,i,j,k,l|m,n,o,p,q,s,w|r,t,u,v,x,y,z
a,b,c,d,e,f|g,h,i,j,k,l|m,n,o,p,q,s,x|r,t,u,v,w,y,z
a,b,c,d,e,f|g,h,i,j,k,l|m,n,o,p,q,s,y|r,t,u,v,w,x,z
a,b,c,d,e,f|g,h,i,j,k,l|m,n,o,p,q,s,z|r,t,u,v,w,x,y
a,b,c,d,e,f|g,h,i,j,k,l|m,n,o,p,q,t,u|r,s,v,w,x,y,z
a,b,c,d,e,f|g,h,i,j,k,l|m,n,o,p,q,t,v|r,s,u,w,x,y,z
a,b,c,d,e,f|g,h,i,j,k,l|m,n,o,p,q,t,w|r,s,u,v,x,y,z
a,b,c,d,e,f|g,h,i,j,k,l|m,n,o,p,q,t,x|r,s,u,v,w,y,z
a,b,c,d,e,f|g,h,i,j,k,l|m,n,o,p,q,t,y|r,s,u,v,w,x,z
a,b,c,d,e,f|g,h,i,j,k,l|m,n,o,p,q,t,z|r,s,u,v,w,x,y
a,b,c,d,e,f|g,h,i,j,k,l|m,n,o,p,q,u,v|r,s,t,w,x,y,z
a,b,c,d,e,f|g,h,i,j,k,l|m,n,o,p,q,u,w|r,s,t,v,x,y,z
a,b,c,d,e,f|g,h,i,j,k,l|m,n,o,p,q,u,x|r,s,t,v,w,y,z
a,b,c,d,e,f|g,h,i,j,k,l|m,n,o,p,q,u,y|r,s,t,v,w,x,z
a,b,c,d,e,f|g,h,i,j,k,l|m,n,o,p,q,u,z|r,s,t,v,w,x,y
...

      

One of the nice things about the approach I gave is that you can process the results as you create them - you don't need to wait for the entire list to be created, and you don't need to have all the combinations in memory at the same time.

But keep in mind that you will have many combinations - perhaps more than a computer can generate at any reasonable time, regardless of algorithmic efficiency. If Vincent's estimate of 10 ^ 12 is correct, for example after about a year using the above code. You may be able to optimize it for up to a month or so. Parallelization can take up to a week on a really powerful computer.

+3


source


This is a recursion problem.

If I wanted to find a list of all sets of length n containing some letter, the easiest way to think about it is to list all sets of length n-1 that do not contain that letter, contacted with a set of letters [letter] for each one, to avoid duplicates, you are discarding all the items you previously did

For example, if I wanted to find the number of two letter combinations in the set [AF], the answer is to take each element and find its combinations. So say that I want to find all combinations containing A, which will then be [A] [BF], and then say that I want to find all combinations containing B but not A, to continue it might be [B] [ CF] Doing this for all af will get you all possible combinations of two AF, so the combination will now be your tail for the three letter combinations.

You would add to all two letter combinations that do not contain A, then you add b to all two letter combinations that do not contain a or b, and continue that to get all three letter combinations.



You can continue this algorithm to have as many levels as you want and it will find combinations of all elements of a set of a given length

I know you are not looking for code, but here is a C # implementation

public IList<string> Combos(IList<string> elements, int level)
    {
        if (level == 1)
        {
            return elements;
        }
        var combinations = new List<string>();
        var previousCombos = Combos(elements, level - 1);
        for (var i = 0; i < elements.Count; i++)
        {
            previousCombos.ToList().ForEach(item =>
            {
                if (!elements.Take(i+1).Any(item.Contains))
                {
                    combinations.Add(item + elements[i]);
                }
            });
        }
        return combinations;
    }

      

Just a word of warning this is incredibly inefficient, in fact I believe it is an exponential algorithm, so don't use it on large datasets or sizes, or your calculations will be forever.

0


source







All Articles