Generating random fixed permutations of string length

Question

Generating random fixed permutations of string length

Let's say that my alphabet contains the letters X and my language only supports Y letter words (Y <X of the course). I need to generate all the words at random.

eg. Alphabet = a, b, c, d, e, f, f Y = 3

So, the words will be: aaa aab AAC aba .. BBB sss .. (the above should be generated in random order)

A trivial way to do this is to generate the words and then randomize the list. I SHOULD NOT do this. I want to generate words in random order.

rondom (n) = letter [x] .random (n-1) will not work, because then you will have a list of words starting with the letter [x] .. which will make the list not so random.

Any code / pseudocode is appreciated.

+1

algorithm random

Sridhar Iyer Dec 14. '08 at 19:07

source to share

3 answers

iammichael · Answer 1 · 2009-06-22T16:09:05+0000

Like the other answers, there are two main approaches: 1) keep track of what you have already generated (the suggested solutions in this category suffer, perhaps never end), or 2) keep track of which substitutions have not yet been created (which implies that the permutations must be pre-generated, which was specifically prohibited in the requirements). Here's another solution that is guaranteed to terminate and does not require pre-generation, but may not meet your randomization requirements (which are undefined at the moment).

General overview: Generate a tree to keep track of what has been created or what is left. "select" new permutations "by going through random tree references, clipping the tree in the sheets after generating this permutation to prevent it from being re-created.

Without a diagram board, this is, hopefully, a description good enough to describe what I mean: Create a "node" that has links to other nodes for every letter in the alphabet. This can be done using a generic map of the alphabet letters in the nodes, or if your alphabet is corrected, you can create specific links. node represents the available letters in the alphabet that can be "crafted" to generate a permutation. Start generating permutations by visiting the root node, picking a random letter from the available letters in the node, and then navigating that link to the next node. Each traversal creates a permutation letter. When the leaf is reached (ie, the permutation is fully built), you bounce the tree to see if the parent nodes have any permutations available; if not,the parent node can be trimmed.

As an implementation detail, a node can store a set of letters that are not available for creation at that point, or a set of letters that are still available for creation at that point. To possibly reduce storage requirements, you can also let node store either a flag indicating that it does so that when a node allows more than half of the alphabet, it keeps the letters released so far and switches to use the letters left over when available less than half of the alphabet.

Using such a tree structure limits what can be created without first generating all the combinations, since you do not need to pre-build the entire tree (it can be built as permutations are created), and you are guaranteed to complete due to cleanup of nodes (i.e. (i.e. you only traverse node references when it is a legal combination for an unproduced permutation).

I find the randomization of this technique a bit odd, and I don't think every combination can be generated equally at any given time, although I haven't thought about it. It's also probably worth noting that while the full tree is not necessarily generated up front, the overhead is likely to be sufficient, so you might be better off generating all the permutations.

Jennifer · Answer 2 · 2008-12-14T19:18:02+0000

I think you can do something quite simply by creating a random character array based on the alphabet you have (in C #):

        char[] alphabet = {'a', 'b', 'c', 'd'};
        int wordLength = 3;

        Random rand = new Random();

        for (int i = 0; i < 5; i++)
        {
            char[] word = new char[wordLength];
            for (int j = 0; j < wordLength; j++)
            {
                word[j] = alphabet[rand.Next(alphabet.Length)];
            }
            Console.WriteLine(new string(word));
        }

Obviously this can lead to duplicates, but you could store the results in a hashmap or check something for duplicates if you need to.

Erika · Answer 3 · 2009-06-22T15:21:14+0000

So, I believe you want to create a set permutation using as little memory as possible.

First, it is impossible to do this without memory. For your first line, you need a function that can produce any of the lines with equal probability. Let's say the function is called nextString (). If you call nextString () again without changing anything in the state, of course it can produce any of the strings again.

So, you need to store something. The question is what do you need to store and how much space do you need?

Strings can be thought of as numbers 0 - X ^ Y. (aaa = 0, aab = 1, aac = 2 ... aba = X ...) So, to store one string as efficiently as possible, you need the lg (X ^ Y). Let's say X = 16 and Y = 2. Then you need 1 byte of storage to uniquely identify the string.

Of course the most naive algorithm is to tag each line as it is created, which takes X ^ Y bits, which in my example is 256 bits (32 bytes). This is what you said you didn't want to do. You can use the shuffle algorithm as discussed in this question: Generating a random ordered list from an ordered list (you won't need to store strings when you create them through the shuffle algorithm, but you should mark them anyway).

Ok, now the question is, can we do better than this? How much do we need to store, total?

Well, on the first call, we don't need any storage. In the second call, we need to know which one was created earlier. On the last call, we only need to know which one is the last one left. So the worst case is when we're halfway there. When we were halfway there, 128 lines were released and there are 128 left. We need to know what to leave for production. Assuming the process is truly random, any split is possible. There are (256 options 128) possibilities. To potentially be able to store any of these, we need an lg bit (256 select 128), which is 251.67 according to the Google calculator. So if you were really smart, you could compress information by 4 fewer bits than a naive algorithm. Probably not worth it.

If you just want it to look randomized with very small storage, see this question: Looking for an algorithm to spit out a sequence of numbers in (pseudo) random order

Generating random fixed permutations of string length

More articles: