Keep random items without duplicates at a constant time

I have a array

in which I am outputting random

items

[a, b, c, d ,...]

function getRandomItem(){
   // return random item from array
}

      

I also have a SQL table like this:

category_id
random_item

      

Then I want to add this item to the table. Each category_id

requires multiple lines of random element, e.g .:

  • There are no duplicate elements in each category (element a cannot represent category category_id 1 twice, but element a can be in category_id

    1 and category_id

    2)
  • The number of elements will be less than the length of the array. (This is not a requirement that will always be the case.)

Here's some mock code that does exactly that:

function persist(){
    var a = giveRandomItem();
    // $1 = a
    return execute("INSERT INTO mytable (random_item) values ($1) ON CONFLICT DO NOTHING RETURNING *", a)
}

 // usage
 var persisted;
 while(persisted === undefined){
    persisted = persist();
 }

      

The problem is that this is not constant time. Chances are that I hit the DB 5 times in a row because the item is already saved.

For each category, I expect max 5k elements and my array length is 400,000. So the probability is pretty low.

I would like to find a way that is time constant nonetheless, or at least have an sql command that will try to use multiple values ​​to further reduce the likelihood.


Use case

A simple use case I can think of is this (useless but simple):

Users are presented with an interface where they can select a category. They can then click on a button that adds a random item to it. There are several users, each of whom acts individually. So user 1 can add a random item to category 1 while user 2 simultaneously adds a random item to category 2

EDIT

I ended up doing something like this:

At the application level:

shuffle(array);

function getRandomItem(seed, inc){
   let index = (seed + inc) % array.length;
   return array[index]
}

// usage:

let seed = item.category_id
let inc = category.item_count

      

This way I have no duplicates since I said the number of elements is less than the length of the array. Also, the items appear random as the category ID is used as the seed to start the increment. However, this is only for the starting point and therefore it is not random, but it works for my use.

+3


source to share


1 answer


To ensure that you don't experience conflicts (unique constraint violations), you must change your approach. Instead of generating one random element at a time, you should generate all 5K elements at the same time (and then insert them into the array). Bulk insertion will also speed up the process significantly.

How to generate 5K random elements from an array of 400K elements?

One way is to shuffle the array and take the first 5K elements. Then the next 5K elements, etc. This also ensures that individual batches do not have duplicate elements (until the entire 400K is exhausted and you start over from the beginning of the array).



If you want the elements to be able to repeat themselves between batches, then reshuffle the array between batches.


After discussion in the comments, it looks like you need an algorithm that generates circular permutations . For each category store in the DB, the seed / internal state of this algorithm needs to know how to continue collecting the 400K array elements in a way that looks random, but does not repeat until all 400K elements for the category have been selected.

+2


source







All Articles