How to weight a random number based on an array

I was thinking about how to implement something that is honestly beyond my math skills. So here, feel free to give it a try and point me in the right direction and not a complete code solution, any help I would be grateful for.

So, imagine that I analyzed the text and created a table of frequencies of different two-character combinations. I saved them in a 26x26 array. eg.

  A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
A 1 15 (frequency of AA, then frequency of AB etc.)
B 12 0 (freq of BA, BB etc..)
... etc.

      

So, I want to randomly select these two-character combinations, but I would like to weigh my selection based on frequency. i.e. AB on top should be 15 times more likely than AA. And obviously the selection shouldn't return something like BB (i.e. Frequency 0 - in this example, BB is obviously found in words like Bubble !! :-)). For case 0, I realize I can loop until I get a non-0 frequency, but that just isn't graceful because I have a feeling / intuition that there is a way to skew my average.

I thought to pick the first char of my pair - that is. string - (I end up generating a 4-pair sequence) I could just use the system random function (Random class.Next) then use a "weighted" random algorithm to select the second char.

Any ideas?

+3


source to share


6 answers


Given your example example, I would first create a cumulative series of all numbers (1, 15, 12, 0 => 1, 16, 28, 28).

Then I would generate a random number between 0 and 27 (say 19).



Then I would calculate that 19 was> = 16 but 28, giving me a bucket of 3 (BA).

+5


source


There are some good suggestions in other answers for your specific problem. To solve the general problem "I have a source of random numbers that match a uniform probability distribution, but I would like it to match a given non-uniform probability distribution," then you can work out a quantile function, which is a function that performs this transformation. I am giving a gentle introduction explaining why the quantile function is the function you want here:



Generating random uneven data in C #

+5


source


How about summing all the frequencies and using AA to ZZ to generate your pair.

Suppose you have a common frequency of pairs, if rnd returns 0 you get AA, if it returns 1-14 then its AB, etc.

+1


source


Use a frequency matrix to generate a complete set of values. Order the set using Random.Next (). Store the randomized set in an array. Then you can just select an element if that array is based on Random.Next (randomarray.Length).

If there is a mathematical way to calculate the frequency, you can do it. But creating a precompiled and cached set will reduce the computation time if it is called repeatedly.

As a side note, depending on the maximum frequency, this may require a good amount of memory. You would also like to instantiate the random before the loop to create the set. This means that you are not reloading the random generator.

...

Another way (similar to what you suggested at the end of your question) would be to do it in two passes with the first row selection and the second using your weighted frequency to select the column. It will simply be the sum of the band-limited line frequencies. The first sentence should give a more even weight distribution.

+1


source


Let's take the sum of the probabilities. Take a random number between zero and this amount. Add probabilities until you get them greater than or equal to your random number. Then use the element on yours.

For example, pseudocode:

b = getProbabilites()
s = sum(b)
r = randomInt() % s
i = 0
acc = 0
while (acc < r) {
    acc += b[i]
    i++
}

return i

      

0


source


If efficiency is not an issue, you can create a key-> value hash instead of an array. The potential for this would be that (if you format the text correctly) it would be very easy to update the values ​​if the need arises. Something like

{
    AA => 5, AB => 2, AC => 4,
    BA => 6, BB => 5, BC => 9,
    CA => 2, CB => 7, CC => 8
}

      

With this, you can easily get the sequence you want and quickly find the record to update. If the table is auto-generated and extremely large it might help to get / be familiar with using vim regexes.

0


source







All Articles