How to efficiently select a random element from a Scala immutable HashSet

I have scala.collection.immutable.HashSet

one that I want to randomly select from an item.

I could solve the problem with an extension method like this:

implicit class HashSetExtensions[T](h: HashSet[T]) {
  def nextRandomElement (): Option[T] = {
    val list = h.toList
    list match {
      case null | Nil => None
      case _ => Some (list (Random.nextInt (list.length)))
    }
  }
}

      

... but converting to a list will be slow. What would be the most effective solution?

+3


source to share


2 answers


WARNING This answer is for experimental use only. For a real project, you should probably use your own collection types.

So, I've done some research on the source of the HashSet , and I think there is little opportunity to somehow extract the inner structure of the most valuable class HashTrieSet

without breaking the package.

I came up with this code which expanded on Ben Reich's solution :

package scala.collection

import scala.collection.immutable.HashSet
import scala.util.Random

package object random {
  implicit class HashSetRandom[T](set: HashSet[T]) {
    def randomElem: Option[T] = set match {
      case trie: HashSet.HashTrieSet[T] => {
        trie.elems(Random.nextInt(trie.elems.length)).randomElem
      }
      case _ => Some(set.size) collect {
        case size if size > 0 => set.iterator.drop(Random.nextInt(size)).next
      }
    }
  }
}

      



the file must be created somewhere in the folder src/scala/collection/random

Pay attention to the package scala.collection

- this thing makes the visible part elems

HashTrieSet

visible. This is just a solution that I would think that might work better than O(n)

. The current version should be O(ln(n))

as complex as any of the immutable.HashSet

s operations.

Another warning is that the private structure is HashSet

not part of the standard scala library API, so it can change any version, making this code in error (although it hasn't changed since 2.8)

+2


source


Since size

O(1)

on HashSet

, and is iterator

as lazy as possible, I think this solution will be relatively efficient:

implicit class RichHashSet[T](val h: HashSet[T]) extends AnyVal {
    def nextRandom: Option[T] = Some(h.size) collect {
        case size if size > 0 => h.iterator.drop(Random.nextInt(size)).next
    }
}

      

And if you're trying to get every ounce of efficiency you can use here match

instead of the more succinct Some/collect

idiom used here.



You can look mutable HashSet

to see the method size

. The method defined there iterator

simply calls iterator

on FlatHashTable

. The same basic effectiveness of these techniques applies to immutable HashSet

, if that's what you're working with. For comparison, you can see that the implementation toList

is not HashSet

completely type-hierarchical TraversableOnce

and uses much more primitive elements, which are probably less efficient and (of course) the entire collection must be iterated over to be generated List

. If you are going to convert an entire set to a collection Traversable

, you should use Array

or Vector

, which have a persistent search.

You may also notice that there is nothing special about the methods above HashSet

, and you could enrich Set[T]

instead if you chose that way (although there would be no guarantee that it would be as effective on others Set

, of course).

As a side note, when implementing rich classes for extension methods, you should always consider the implicit user-defined value class when extending AnyVal

. You can read some of the benefits and limitations in the docs and this answer .

+1


source







All Articles