Weighted random selection from a sorted list

I have a problem with a large list of products sorted by weight. I need to be able to randomly select items from this list, but items closer to the start (more weight) should have a greater chance of being selected based on the "elitism" factor.

I understand that similar questions have been asked before, but the bottom line is that this list will change over time. The new values ​​will be sorted in the list as the last item is removed (to keep the pool of "optimized" values ​​constant in size).

First, what is the most efficient way to choose? The selection should take place in real time from a list of 50 to 1000 items.

Second, what is the best data structure to use here? I am using C #.

I was just thinking about a possible solution, but I would like some feedback on this idea. What if I generated a random floating point value in a specific range and then did something like squaring it? Small values ​​will return small values, and large values ​​will return MUCH large values. From what I can tell, matching this result to the length of the list should give the desired effect. Does this sound right?

+3

source to share

4 answers

Create a binary tree sorted by weight (no sorting required, except as asked in the question) and for each node records the total weight of all children. At the top of this, we can calculate the total weight of the entire list.

Choose a random value `r`

between zero and the total weight of everything. In each node, if the weight of the current node is greater than `r`

then this is your result. Otherwise, subtract the weight of the current node from `r`

. Now, if the total weight of all the left children is less than `r`

then go left. Otherwise, subtract the total weight of all children on the left from `r`

and go right. Repeat until you get the result.

The cost of insertion and removal depends on how you decide to implement and balance your tree, but you will also have to traverse all ancestors to update their weights.

If you really don't need to sort it, making a bunch of it can improve performance.

+1

source

I would do something like:

``````string[] names = new[] { "Foo", "Bar", "Fix" };

// The weights will be 3, 2, 1
int[] weights = new int[names.Length];
for (int i = 0; i < names.Length; i++)
{
weights[i] = names.Length - i;
}

int[] cumulativeWeights = new int[names.Length];

// The cumulativeWeights will be 3, 5, 6
// so if we generate a number, 1-3 Foo, 4-5 Bar, 6 Fiz
cumulativeWeights[0] = weights[0];
int totalWeight = weights[0];

for (int i = 1; i < cumulativeWeights.Length; i++)
{
cumulativeWeights[i] = cumulativeWeights[i - 1] + weights[i];
totalWeight += weights[i];
}

var rnd = new Random();

while (true)
{
int selectedWeight = rnd.Next(totalWeight) + 1; // random returns 0..5, +1 == 1..6
int ix = Array.BinarySearch(cumulativeWeights, selectedWeight);
// If value is not found and value is less than one or more
// elements in array, a negative number which is the bitwise
// complement of the index of the first element that is
// larger than value.
if (ix < 0)
{
ix = ~ix;
}

Console.WriteLine(names[ix]);
}
```

(adsbygoogle = window.adsbygoogle || []).push({});

```

I have built an array `weight`

. I used the linear method. The first item has a weight (number of items), the second has weight (number of items - 1), etc. You can use your own algorithm, but it is easier if the weight is an integer.

Then I calculated the array `cumulativeWeights`

and `totalWeight`

.

Then I can extract the binary number between `1`

and `totalWeight`

and find the index that it has `cumulativeWeight`

, which is <= random number. Being `cumulativeWeights`

sorted (clearly :-)) I can use `Array.BinarySearch`

, which has the advantage that if the exact number is not found, the index for the next largest number is given.

Now, with `double`

`weight`

, it gets a little more complicated for the part `Random`

:

``````string[] names = new[] { "Foo", "Bar", "Fix" };

// The weights will be 3.375, 2.25, 1.5
double[] weights = new double[names.Length];
for (int i = 0; i < names.Length; i++)
{
weights[i] = Math.Pow(1.5, names.Length - i);
}

double[] cumulativeWeights = new double[names.Length];

// The cumulativeWeights will be 3.375, 3.375+2.25=5.625, 3.375+2.25+1.5=7.125
// so if we generate a number, 1-3.375 Foo, >3.375-5.625 Bar, >5.625-7.125 Fiz
// totalWeight = 7.125
cumulativeWeights[0] = weights[0];
double totalWeight = weights[0];

for (int i = 1; i < cumulativeWeights.Length; i++)
{
cumulativeWeights[i] = cumulativeWeights[i - 1] + weights[i];
totalWeight += weights[i];
}

var rnd = new Random();

while (true)
{
// random returns (0..1 * totalWeight - 1) + 1 = (0...6.125) + 1 = 1...7.125
double selectedWeight = (rnd.NextDouble() * (totalWeight - 1)) + 1;

int ix = Array.BinarySearch(cumulativeWeights, selectedWeight);
// If value is not found and value is less than one or more
// elements in array, a negative number which is the bitwise
// complement of the index of the first element that is
// larger than value.
if (ix < 0)
{
ix = ~ix;
}

Console.WriteLine(names[ix]);
}
```

(adsbygoogle = window.adsbygoogle || []).push({});

```

The method `Random.NextDouble()`

returns a number `0<=x<1`

that we need to convert to our weight.

Based on this principle, you can build a class `List<T>`

that uses it:

``````public class ListWithWeight<T>
{
private readonly List<T> List = new List<T>();

private readonly List<double> CumulativeWeights = new List<double>();

private readonly Func<int, double> WeightForNthElement;

private readonly Random Rnd = new Random();

public ListWithWeight(Func<int, double> weightForNthElement)
{
WeightForNthElement = weightForNthElement;
}

public void Add(T element)
{
List.Add(element);

double weight = WeightForNthElement(List.Count);

if (CumulativeWeights.Count == 0)
{
CumulativeWeights.Add(weight);
}
else
{
CumulativeWeights.Add(CumulativeWeights[CumulativeWeights.Count - 1] + weight);
}
}

public void Insert(int index, T element)
{
List.Insert(index, element);

double weight = WeightForNthElement(List.Count);

if (CumulativeWeights.Count == 0)
{
CumulativeWeights.Add(weight);
}
else
{
CumulativeWeights.Add(CumulativeWeights[CumulativeWeights.Count - 1] + weight);
}
}

public void RemoveAt(int index)
{
List.RemoveAt(index);
CumulativeWeights.RemoveAt(List.Count);
}

public T this[int index]
{
get
{
return List[index];
}

set
{
List[index] = value;
}
}

public int Count
{
get
{
return List.Count;
}
}

public int RandomWeightedIndex()
{
if (List.Count < 2)
{
return List.Count - 1;
}

double totalWeight = CumulativeWeights[CumulativeWeights.Count - 1];
double selectedWeight = (Rnd.NextDouble() * (totalWeight - 1.0)) + 1;

int ix = CumulativeWeights.BinarySearch(selectedWeight);
// If value is not found and value is less than one or more
// elements in array, a negative number which is the bitwise
// complement of the index of the first element that is
// larger than value.
if (ix < 0)
{
ix = ~ix;
}

// We want to use "reversed" weight, where first items
// weight more:

ix = List.Count - ix - 1;
return ix;
}
}
```

(adsbygoogle = window.adsbygoogle || []).push({});

```

and

``````var lst = new ListWithWeight<string>(x => Math.Pow(1.5, x));
lst.Add("Foo");
lst.Add("Bar");
lst.Add("Fix");
lst.RemoveAt(0);
lst.Insert(0, "Foo2");

while (true)
{
Console.WriteLine(lst[lst.RandomWeightedIndex()]);
}
```

(adsbygoogle = window.adsbygoogle || []).push({});

```
+1

source

This is what I would do:

``````private static int GetPosition(double value, int startPosition, int maxPosition, double weightFactor, double rMin)
{
while (true)
{
if (startPosition == maxPosition) return maxPosition;

var limit = (1 - rMin)*weightFactor + rMin;
if (value < limit) return startPosition;
startPosition = startPosition + 1;
rMin = limit;
}
}

static void Main()
{
const int maxIndex = 100;
const double weight = 0.1;

var r = new Random();
for (var i = 0; i < 200; i++)
Console.Write(GetPosition(r.NextDouble(), 0, maxIndex, weight, 0) + " ");
}
```

(adsbygoogle = window.adsbygoogle || []).push({});

```

A weighting factor of 0.1 means that the first item has a 10% chance of being selected. All other items have 90%.

The second item has 10% of the remaining 90% = 9%

The third subject has 10% of the remaining 81% = 8.1%

...

As the weighting increases, it is more likely that the first positions will be selected over the last in the list. With a factor of 1, only the 1st element is selected.

For weights 0.1 and 10 points, here are the probabilities for each index:

``````0: 10%
1: 9%
2: 8.1%
3: 7.29%
4: 6.56%
5: 5.9%
6: 5.31%
7: 4.78%
8: 4.3%
9: 3.87%
```

(adsbygoogle = window.adsbygoogle || []).push({});

```

EDIT

Of course this will only work for many indices (at least 10 for 0.1), otherwise it will give high probabilities for the last index. For example, if weight = 0.1 and maxIndex = 1, index 0 will have 10% probability, but index 1 will have 90%.

+1

source

Unfortunately, I can't suggest any code right now, but some ideas:

Since your list is sorted from high weight to low weight, you should be able to use a random number generator based on a normal distribution. If you don't have such a random number generator, you can convert the uniform distribution to the normal distribution using the code found here: Random Gaussian Variables

I explain horribly, but I'll try: You can define the bias (mean) by 0 and the sigma (deviation) by, say 3. Then you take the absolute value from the generated number, since you can get negative numbers.

This will give you a number generator that is likely to have a maximum offset number (0 in the above example) and less likely for numbers that are significantly different from them.

As I said, I explain horribly

+1

source

All Articles